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ECOLOGY,  DIVERSITY  AND  COMPARATIVE  GENOMICS  OF 
OCEANIC  CYANOBACTERIAL  VIRUSES 

By 

Matthew  Brian  Sullivan 

Submitted  to  the  Department  of  Biology  in  April,  2004  in  partial  fulfillment  of  the  requirements 

for  the  degree  of  Doctor  of  Philosophy 


ABSTRACT 

The  marine  cyanobacteria  Prochlorococcus  and  Synechococcus  are  numerically  dominant 
primary  producers  in  the  oceans.  Each  genera  consists  of  multiple  physiologically  and 
genetically  distinct  groups  (termed  “ecotypes”  in  Prochlorococcus).  Cyanobacterial  viruses 
(cyanophages)  that  infect  Synechococcus  are  abundant  (to  104-106  phage  ml'1)  in  the  oceans  and 
calculations  suggest  that  they  play  a  small  but  significant  role  in  host  mortality.  Cyanophages  are 
also  thought  to  shape  their  host  populations  through  regulation  of  sub-populations  and  through 
transfer  of  genes. 

Here  we  describe  the  isolation  of  Prochlorococcus  cyanophages  and  the  assembly  of  a 
culture  collection  established  using  a  broadly  diverse  suite  of  Prochlorococcus  and 
Synechococcus  hosts.  The  collection  contains  three  morphological  families,  Myoviridae. 
Podoviridae  and  Siphoviridae,  known  to  infect  marine  bacteria  and  cyanobacteria.  Host  strains 
of  similar  ecotypes  often  yielded  cyanophages  of  the  same  family.  Host-range  analyses  of  these 
isolates  demonstrated  varying  levels  of  specificity  among  the  different  morphological  types, 
ranging  from  infection  of  a  single  strain  to  infection  across  ecotypes  and  even  across  both 
cyanobacterial  genera.  Strain-specific  cyanophage  titers  were  low  in  open  ocean  waters  where 
total  cyanobacterial  abundances  were  high,  suggesting  low  phage  titers  might  be  a  feature  of  open 
oceans.  Investigations  of  the  underlying  cause(s)  of  this  trend  require  culture-independent  assays 
for  quantifying  phage  that  infect  particular  hosts.  We  used  the  phage  g20  gene,  which  encodes 
the  portal  protein,  to  examine  the  diversity  of  Myoviridae  isolates  and  found  that  g20  sequences 
from  our  isolates  had  high  similarity  to  those  from  other  cultured  isolates,  but  not  to  six 
phylogenetic  clusters  of  environmental  g20  sequences  that  lacked  cultured  representatives. 

Three  Prochlorococcus  cyanophage  genomes  were  sequenced  and  analysis  of  these 
genomes  show  striking  similarity  to  the  well-studied  T7-  and  T4-like  phages,  but  additionally 
suggest  that  these  Prochlorococcus  cyanophages  are  modified  for  infection  of  photosynthetic 
hosts,  that  live  in  nutrient-limited  environments.  All  three  cyanophage  genomes  contain,  among 
other  novel  genes  of  interest,  photosynthetic  genes  that  are  full-length,  conserved,  and  clustered 
in  the  genome  suggesting  they  are  functional  during  infection.  Phylogenetic  inference  suggests 
that  some  of  these  genes  were  horizontally  transferred  between  host  and  phage  influencing  the 
evolution  and  ecology  of  both  host  and  phage. 


This  thesis  was  co-supervised  by:  Sallie  W.  Chisholm  (MIT  Professor  of  Civil  and  Environmental 
Engineering  and  Biology)  and  John  B.  Waterbury  (WHOI  Professor  of  Biology) 
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CHAPTER  I.  INTRODUCTION 


The  bacterial  viruses  (phages)  contained  in  the  oceans  are  the  most  abundant  “biological 
entities”  on  the  planet  (Hendrix  et  al.,  1999).  We  are  interested  in  understanding  the  interplay 
between  marine  phages  and  their  hosts  and  how  this  interplay  affects  the  physiological 
differentiation  and  evolution  of  each.  In  this  thesis,  we  isolated  and  examined  cyanophage  whose 
hosts  are  the  marine  cyanobacteria  Prochlorococcus  and  Synechococcus.  These  cyanobacteria 
are  the  numerically  dominant  primary  producers  (Partensky,  Hess,  and  Vaulot,  1999;  Waterbury 
et  al.,  1986)  responsible  for  32-89%  of  primary  production  in  the  vast  surface  oligotrophic  oceans 
thus  contributing  to  carbon  fixation  on  a  global  scale  (Goericke,  1993;  Li,  1994;  Liu,  Campbell, 
and  Landry,  1995;  Liu  et  al.,  1998;  Liu  and  Landry,  1999;  Liu,  Nolla,  and  Campbell,  1997; 
Veldhuis  et  al.,  1997). 

Early  culture-based  studies  using  terrestrial  microbes  as  hosts  suggested  that  phages  were 
rare  in  the  oceans  (ZoBell,  1946).  It  was  not  until  appropriate  marine  host  strains  were  used  that 
indigenous  marine  phages  were  clearly  demonstrated  (Spencer,  1955;  Spencer,  1960;  Spencer. 
1963).  Later,  electron  microscopy  (EM)  of  concentrated  seawater  samples  revealed  virus-like 
particles  (VLPs)  occurring  at  concentrations  up  to  108  VLPs  ml'1  in  aquatic  environments  (Bergh, 
1989;  Borsheim,  Bratbak,  and  Heldal,  1990;  Bratbak.  1990;  Bratbak  et  al.,  1992;  Heldal  and 
Bratbak,  1991;  Proctor  and  Fuhrman,  1990;  Sieburth,  Johnson,  and  Hargraves,  1988;  Torella  and 
Morita,  1979).  Subsequent  field  studies  using  EM  and  epifluorescence  microscopy  showed  that 
VLP  abundances  were  approximately  an  order  of  magnitude  higher  than  total  prokaryote  counts 
in  seawater  -  leading  to  the  suggestion  that  phage  may  be  a  significant  source  of  host  mortality  in 
marine  systems  (Bratbak,  1990;  Bratbak.  1993;  Cottrell  and  Suttle,  1991;  Moebus.  1987; 
Nagasaki,  Tarutani,  and  Yamaguchi,  1999;  Proctor  and  Fuhrman,  1990;  Suttle,  1999;  Wommack 
et  al.,  1992). 

To  begin  to  understand  phage-host  dynamics  and  to  extrapolate  ecological  significance, 
model  marine  phage-host  systems  are  critical.  Among  the  few  marine  microbes  that  have  been 
successfully  cultured  (Fuhrman  and  Campbell,  1998),  Prochlorococcus  and  Synechococcus  are 
not  only  ecologically  important  genera  (Partensky,  Hess,  and  Vaulot,  1999;  Waterbury  et  al., 
1986),  but  are  also  unique  in  having  many  (80+)  strains  now  in  culture.  The  genetic  diversity  of 
Prochlorococcus  and  Synechococcus  cultured  isolates  has  been  well  characterized  using  multiple 
molecular  markers  and  all  suggest  similar  phylogenetic  topologies  (Ferris  and  Palenik,  1998; 
Fuller  et  al.,  2003;  Moore,  Rocap,  and  Chisholm,  1998;  Rocap  et  al.,  2002;  Zeidner  et  al.,  2003). 
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Among  Prochlorococcus,  laboratory  work  with  cultured  isolates  shows  that  they  form  at  least  two 
physiologically  (Moore  and  Chisholm,  1999;  Moore,  Goericke,  and  Chisholm,  1995;  Moore, 
Rocap,  and  Chisholm,  1998)  and  genetically  (Moore,  Rocap,  and  Chisholm,  1998;  Rocap  et  al., 

2002)  distinct  groups  which  are  termed  ecotypes.  At  this  broad  ecotype  level,  ribosomal  DNA 
genetic  differences  are  reflected  in  copper  sensitivity  (Mann  et  al.,  2002),  nutrient  utilization 
(Moore  et  al.,  2002)  and  genetic  composition  of  the  genome  (Dufresne  et  al.,  2003;  Rocap  et  al., 

2003) .  It  has  been  suggested  that  cyanophage  may  be  partly  responsible  for  this  diversity  through 
the  regulation  of  subpopulations  of  these  cyanobacteria  as  well  as  through  horizontal  gene 
transfer  (Marston  and  Sallee,  2003;  Suttle  and  Chan,  1994;  Waterbury  and  Valois,  1993). 

Building  a  diverse  cvanophage  collection 

Because  of  the  breadth  of  known  genetic  diversity  among  these  marine  cyanobacteria 
(Fig.  1),  we  were  interested  in  developing  a  cyanophage  collection  using  hosts  that  represented 
the  rDNA-based  genetic  diversity  known  at  the  time  (Rocap  et  al.,  2002).  We  note  that  even  in 
these  well-studied  marine  cyanobacteria,  culture  collections  may  not  yet  represent  their  naturally 
occurring  genetic  and  physiological  diversity  (Ahlgren,  Hook,  and  Rocap,  2004;  Fuller  et  al., 
2003;  Rocap,  McKay,  and  Ahlgren,  2004),  but  data  from  Q-PCR  analyses  from  seasonal 
sampling  at  the  Bermuda  Atlantic  Time  Series  and  Hawaii  Ocean  Time  series  suggests  we  often 
come  close  (E.  Zinser.  Unpubl.  Results). 

Synechococcus  cyanophage  have  been  isolated  using  a  number  of  host  strains,  but 
primarily  focusing  on  strain  WH7803  (Table  1)  (Fuller  et  al.,  1998;  Lu,  Chen,  and  Hodson,  2001; 
Suttle  and  Chan,  1993;  Waterbury  and  Valois,  1993;  Wilson  et  al.,  1993).  Synechococcus 
cyanophage  isolates  belong  to  the  three  phage  morphological  familes  ( Myoviridae ,  Podoviridae 
and  Siphoviridae )  that  have  been  identified  for  freshwater  cyanobacteria  (Safferman  et  al.,  1983), 
marine  bacteria  (Moebus,  1987;  Wichels  et  al.,  1998)  and  terrestrial  bacteria  (Bradley,  1967; 
Murphy  et  al.,  1995).  In  this  thesis,  we  created  a  cyanophage  collection  by  isolating  cyanophages 
using  a  diverse  group  of  Prochlorococcus  and  Synechococcus  strains  as  hosts.  We  characterized 
these  isolates  using  host  range,  morphology,  a  family  specific  gene  marker,  and  genomes. 
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Table  1:  Details  of  published  cyanophage  isolates  isolated  using  strains  of  marine  Synechococcus  spp. 
M/P/S  represent  this  phage  isolate  belongs  to  the  family  Myoviridae,  Podoviridae  or  Siphoviridae, 
respectively.  Morphological  measurements  and  location  information  are  included  where  available. 


Cyanophage 

Host  Strain 

Location 

M/P/S? 

I  s 

Tail  (nm) 

Reference 

S-BM1 

WH7803 

Bermuda,  BBSR 

M 

88 

57x33 

Wilson  et  al.  1993 

S-PM1 

WH7803 

Plymouth  Sound  (50°18'N,  4°12’W) 

M 

88 

57x33 

Wilson  et  al.  1993 

S-PM2  (formerly  S-PS1) 

WH7803 

Plymouth  Sound  (50°18’N,  4°12’W) 

M 

90 

165x20 

Wilson  et  al.  1993 

S-WHM1 

WH7803 

WHOI 

M 

88 

108  x  23 

Wilson  et  al.  1993 

S-BM2  (formerly  S-BS1) 

WH7803 

Bermuda,  BBSR 

M 

90 

165x20 

Wilson  et  al.  1993 

S-BM3 

WH7803 

Hydrostation  S  (30°10’N,  64°30’W) 

M 

110 

230  x  25 

Fuller  etal.  1998 

S-BM4 

WH7803 

West  coast  of  Bermuda 

M 

125 

190x35 

Fuller  etal.  1998 

S-BM5 

WH7803 

West  coast  of  Bermuda 

M 

125 

190x35 

Fuller  etal.  1998 

S-BM6 

WH7803 

Hydrostation  S  (30°10’N,  64°30W) 

M 

110 

200  x  25 

Fuller  etal.  1998 

S-BP1 

WH8018 

West  coast  of  Bermuda 

P 

125 

Fuller  etal.  1998 

S-BP2 

WH8018 

West  coast  of  Bermuda 

P 

75 

Fuller  et  al.  1998 

S-BP3 

WH7803 

West  coast  of  Bermuda 

P 

70 

Fuller  et  al.  1998 

S-BnMI 

WH7803 

Bergen,  Norway 

M 

Wilson  PhD  thesis  1994 

S-RSM1 

WH7803 

Red  Sea,  Eilat,  Israel 

M 

Wilson  PhD  thesis  1994 

S-RSM2 

WH7803 

Red  Sea,  Eilat,  Israel 

M 

Wilson  PhD  thesis  1994 

S-MM1 

WH7803 

Miami,  FL 

M 

Wilson  PhD  thesis  1994 

S-MM2 

WH7803 

Miami,  FL 

M 

Wilson  PhD  thesis  1994 

S-MM3 

WH7803 

Miami,  FL 

M 

Wilson  PhD  thesis  1994 

S-MM4 

WH7803 

Miami,  FL 

M 

Wilson  PhD  thesis  1994 

S-MM5 

WH7803 

Miami,  FL 

M 

Wilson  PhD  thesis  1994 

S-MM7 

WH7803 

Miami,  FL 

M 

Wilson  PhD  thesis  1994 

S-PWM1 

WH7803 

M 

Suttle  &  Chan  1993 

S-PWM2 

WH7803 

M 

Suttle  &  Chan  1 993 

S-PWM3 

M 

Suttle  &  Chan  1993 

S-PWM4 

M 

Suttle  &  Chan  1 993 

S-BBS1 

BBC1 

S 

50 

230  x? 

Suttle  &  Chan  1993 

S-BBP1 

BBC2 

P 

50 

Suttle  &  Chan  1 993 

S-PWP1 

SNC1 

P 

65 

Suttle  &  Chan  1993 

Synl 

WH8101 

WHOI 

M 

Waterbury  &  Valois  1 993 

Syn2 

WH8012 

Sargasso  Sea  (34°06’N,  61°01  W) 

M 

66 

149x19 

Waterbury  &  Valois  1 993 

Syn5 

WH8109 

Sargasso  Sea  (34°06’N,  6rom) 

P 

Waterbury  &  Valois  1 993 

Syn7 

WH8012 

WHOI 

M 

Waterbury  &  Valois  1 993 

Syn9 

WH8012 

WHOI 

M 

87 

153x19 

Waterbury  &  Valois  1993 

SynlO 

WH8017 

Gulf  Stream  (36058’N,  73°42’W) 

M 

100 

145x19 

Waterbury  &  Valois  1993 

Syn12 

WH8017 

Gulf  Stream  (36°58’N,  73°42’W) 

P 

45 

8x10 

Waterbury  &  Valois  1993 

Synl  4 

WH8103 

Gulf  Stream  (36°58’N,  73°42'W) 

M 

93 

136x21 

Waterbury  &  Valois  1993 

Syn16 

WH8018 

Sargasso  Sea  (34°06’N,  61°0f  *W) 

M 

Waterbury  &  Valois  1993 

Syn17 

WH8018 

Gulf  Stream  (36°58’N,  73°42W) 

M 

Waterbury  &  Valois  1993 

Synl  8 

WH8108 

Sargasso  Sea  (34°06’N,  61°01  W) 

M 

Waterbury  &  Valois  1993 

Syn19 

WH8109 

Sargasso  Sea  (34°06’N,  61°01W) 

M 

Waterbury  &  Valois  1993 

Syn20 

WH8109 

Gulf  Stream  (36°58’N,  73°42’W) 

M 

Waterbury  &  Valois  1993 

Syn22 

WH8109 

Gulf  Stream  (36°58’N,  73°42W) 

M 

Waterbury  &  Valois  1993 

P3 

WH7805 

Sapelo  Island,  GA 

M 

Lu,  Chen  &  Hodson  2001 

P5 

WH7803 

Sapelo  Island,  GA 

M 

Lu,  Chen  &  Hodson  2001 

P6 

WH7805 

Dauphin  Island,  GA 

M 

Lu,  Chen  &  Hodson  2001 

P8 

WH8101 

Sapelo  Island,  GA 

M 

Lu,  Chen  &  Hodson  2001 

P12 

WH8101 

Sayll  Estuary,  Ala 

M 

Lu,  Chen  &  Hodson  2001 

P16 

WH7803 

Savannah  River  Estuary,  GA 

M 

Lu,  Chen  &  Hodson  2001 

P17 

WH7803 

Qingdao  Coast,  China 

M 

Lu,  Chen  &  Hodson  2001 

P39 

WH7805 

Savannah  River  Estuary,  GA 

M 

Lu,  Chen  &  Hodson  2001 

P61 

WH7803 

Satilla  River  Estuary,  GA 

M 

Lu,  Chen  &  Hodson  2001 

P66 

WH7803 

Satilla  River  Estuary,  GA 

M 

Lu,  Chen  &  Hodson  2001 

P73 

WH7805 

Satilla  River  Estuary,  GA 

M 

Lu,  Chen  &  Hodson  2001 

P76 

WH8007 

Altahama  River  Estuary,  GA 

M 

Lu,  Chen  &  Hodson  2001 

P77 

WH8007 

Altahama  River  Estuary,  GA 

M 

Lu,  Chen  &  Hodson  2001 

P79 

WH7805 

Satilla  River  Estuary,  GA 

M 

Lu,  Chen  &  Hodson  2001 

P81 

WH7805 

Altahama  River  Estuary,  GA 

M 

Lu,  Chen  &  Hodson  2001 

PI 

WH7803 

Sayll  Estuary,  Ala. 

S 

Lu,  Chen  &  Hodson  2001 
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Cvanophaee  host  range  and  field  abundance 

As  the  number  of  fully-sequenced  phage  and  microbial  genomes  increases,  their  analyses 
are  revealing  the  importance  of  horizontal  gene  transfer  (HGT)  as  an  evolutionary  force  among 
prokaryotes  and  their  phages  (Hendrix  et  al.,  1999;  Ochman,  Lawrence,  and  Groisman,  2000). 
Although  we  know  that  some  phages  can  move  genes  between  hosts  (Jiang  and  Paul,  1998; 

Miller,  2001;  Miller  et  al.,  1992;  Paul,  1999),  and  that  phages  can  help  shape  microbial 
population  structure  (Fuhrman,  1999;  Mann,  2003;  Suttle  and  Chan,  1994;  Waterbury  and  Valois, 
1993),  we  are  still  working  to  understand  the  mechanisms  and  extent  to  which  phages  influence 
the  evolution  of  their  hosts.  To  begin  to  understand  the  basic  natural  history  of  the  cyanophages 
in  our  collection,  we  examined  the  host  range  of  45  cyanophage  isolates  using  21 
Prochlorococcus  and  Synechococcus  cultured  strains  (Chapter  2).  Do  phages  isolated  using 
Prochlorococcus  cross-infect  other  strains  of  Prochlorococcus ?  Synechococcus ? 

Among  the  broad  diversity  of  phages,  there  are  two  well-described  classes  of  phage 
infective  lifestyles:  lytic  (discussed  first)  and  temperate  (discussed  in  the  following  section). 

Lytic  phages  infect  their  host,  use  the  host  cellular  processes  to  build  new  phage  particles,  and 
then  burst  the  host  cell  releasing  progeny  phage.  Strain-specific  lytic  Synechococcus  cyanophage 
titers  measured  in  mostly  coastal  marine  systems  showed  these  cyanophage  are  abundant  (up  to 
104-106  ml'1)  with  the  highest  Synechococcus  cyanophage  titers  remaining  within  an  order  of 
magnitude  of  the  total  Synechococcus  concentrations  (Lu.  Chen,  and  Hodson,  2001;  Suttle  and 
Chan,  1994;  Waterbury  and  Valois,  1993).  Calculations  from  such  field  abundances  and  other 
observations  (reviewed  in  (Mann,  2003))  suggest  that,  in  marine  systems,  lytic  phages  can  be 
responsible  for  up  to  50%  of  bacterial  mortality  per  day  (Fuhrman,  1999)  and  a  small  (<3%) 
portion  of  cyanobacterial  mortality  per  day  (Mann,  2003;  Suttle,  2000;  Suttle  and  Chan,  1994; 
Waterbury  and  Valois,  1993).  However,  the  open  ocean  systems  dominated  by  Prochlorococcus 
(DuRand,  Olson,  and  Chisholm,  2001;  Partensky,  Hess,  and  Vaulot,  1999)  offer  a  contrasting 
environment  to  the  previous  studies  of  Synechococcus  cyanophage. 

These  oligotrophic  regions  likely  differ  from  coastal  environments  in  three  ways  that 
could  affect  assays  measuring  the  phage  abundances:  the  types  of  dominant  cyanobacterial  cells 
(Ferris  and  Palenik,  1998),  the  total  diversity  of  cell  populations  (Fuhrman,  2000),  and  nutrient 
limitations  (Cavender-Bares,  Karl,  and  Chisholm,  2001;  Wu  et  al.,  2000).  First,  culture-based 
assays  might  better  represent  naturally  occurring  cells  in  one  environment  relative  to  another, 
creating  the  appearance  of  cyanophage  titers  that  change  along  such  a  transect.  Second,  if  host 
cell  diversity  changes  along  such  a  gradient,  then  phage  titers  may  change  accordingly  as  they  are 
suggested  to  decrease  where  host  cell  diversity  increases  (Thingstad,  2000).  Finally,  decreased 
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nutrient  availability  along  the  transect  (Cavender-Bares,  Karl,  and  Chisholm,  2001)  might  result 
in  sub-optimal  growth  of  host  cells  in  the  Sargasso  Sea  (Mann  et  al.,  2002)  relative  to 
Synechococcus  at  the  coastal  site  along  this  transect  (Waterbury  et  al.,  1986).  Viral  production  is 
correlated  with  host  growth  rates  in  chemostats  (Bohannan  and  Lenski,  2000)  and  in  the  field 
(Steward,  Smith,  and  Azam,  1996),  which  could  result  from  nutrient  limitation  causing 
physiological  changes  in  the  host  that  stall  the  lytic  process  of  obligately  lytic  phage  (Stent, 

1963),  or  favour  lysogeny  in  temperate  phage  (Rohwer  et  al.,  2000;  Suttle,  2000;  Wilson,  Carr, 
and  Mann,  1996). 

Previous  work  suggests  that  phosphate,  in  particular,  is  the  most  likely  nutrient  to 
influence  phage  abundances  along  such  a  transect.  Mesocosm  experiments  with  Emiliani  huxleyi 
viruses  showed  that  viral  production  did  not  occur  under  phosphate-deplete  conditions  (Bratbak, 
1993).  Synechococcus  WH7803  infection  by  cyanophage  S-PM2  under  various  nutrient  stress 
conditions  showed  a  significant  (80%)  reduction  in  burst  size  (i.e.,  phage  produced  per  cell) 
occurs  in  phosphate-deplete  (but  not  nitrate-deplete)  conditions  relatve  to  phosphate-replete 
conditions  (Wilson,  Carr,  and  Mann,  1996).  Subsequent  work  in  seawater  mesocosms  with 
natural  communities  of  Synechococcus  showed  that  phosphate  additions  to  phosphate-deplete 
enclosures  terminated  a  Synechococcus  bloom  and  was  interpreted  as  a  phosphate  stimulation  of 
viral  production  through  the  induction  of  prophage  within  the  cyanobacterial  genomes  (Wilson 
and  Mann,  1997).  Finally,  the  phoH  gene,  which  in  E.  coli  is  a  phosphate-induced  gene  that 
encodes  a  putative  ATPase,  is  found  in  the  genomes  of  two  phages  that  infect  marine  bacterial 
hosts  (Miller  et  al.,  2003;  Rohwer  et  al.,  2000).  Together,  these  observations  suggest  that 
phosphate  is  important  to  production  of  phages  in  marine  systems.  We  examined  strain-specific 
cyanophage  titers  along  a  coastal -to-open  Atlantic  Ocean  transect  (Chapter  2).  How  do  such 
Prochlorococcus  and  Synechococcus  phage  titers  vary  along  such  a  transect  that  includes 
gradients  in  the  types  of  dominant  cells,  total  cell  diversity  and  bioavailable  nutrients? 

Phage  portal  protein  gene  diversity  in  myoviruses 

An  understanding  of  the  roles  of  marine  phage  (indeed  all  phage)  in  their  communities 
has  been  hampered  by  the  lack  of  a  universal  gene  (analogous  to  the  16S  rRNA  gene  that  is  used 
as  the  taxonomic  marker  for  all  microbes)  that  could  be  used  to  investigate  phage  genetic 
diversity  (Paul  et  al.,  2002).  For  this  reason,  studying  the  genetic  diversity  of  both  cultured  and 
wild  phages  has  proven  difficult.  Recently,  the  use  of  family-specific  genes  has  been  proposed 
for  use  as  taxonomic  tools  (Rohwer  and  Edwards,  2002).  For  the  Myoviridae,  which  are  highly 
represented  among  Synechococcus  cyanophage  isolates  (Fuller  et  al.,  1998;  Lu,  Chen,  and 
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Hodson,  2001;  Suttle  and  Chan,  1993;  Waterbury  and  Valois,  1993;  Wilson  et  al.,  1993),  the 
family-specific  gene  commonly  used  is  a  gene  homologous  to  the  coliphage  T4  portal  protein 
gene,  g20  (Fuller  et  al.,  1998;  Zhong  et  al.,  2002).  Many  field  studies  have  now  examined 
myophage  (phage  of  the  morphological  family  Myoviridae )  g20  sequence  diversity  in  a  variety  of 
aquatic  environments  using  DGGE  banding  patterns  (Dorigo,  Jacquet,  and  Humbert,  2004; 
Frederickson,  Short,  and  Suttle,  2003;  Wilson  et  al.,  1999;  Wilson  et  al..  2000),  cloning  and 
sequencing  (Dorigo,  Jacquet,  and  Humbert,  2004;  Zhong  et  al.,  2002)  and/or  culturing  and 
sequencing  (Marston  and  Sallee,  2003;  Zhong  et  al.,  2002).  We  examined  the  diversity  of  g20 
sequences  from  the  Myoviridae  in  this  cyanophage  collection  (Chapter  3).  Do  the  g20  sequences 
of  new  cyanophage  isolates  represent  novel  g20  sequence  types?  Are  they  genetically 
related  to  g20  sequences  from  known  cyanophage  isolates  or  from  unknown  environmental 
g20  sequences?  Are  existing  PCR  primer  sets  adequately  capturing  the  full  g20  sequence 
diversity?  Does  g20  diversity  correspond  to  ecologically  relevant  phage  traits? 

Searching  for  prophage  in  host  cvanobacterial  genomes 

In  contrast  to  lytic  phages,  temperate  phages  do  not  always  immediately  lyse  their  hosts, 
but  instead  can  temporarily  insert  their  DNA  into  their  host  genome  as  a  prophage.  The  existence 
of  inducible  prophage  (prophage  that  can  be  ‘induced’  out  of  the  host  genome  to  enter  the  lytic 
phase  of  the  temperate  phage  life  cycle)  was  first  elegantly  demonstrated  at  the  single  cell  level 
over  half  a  century  ago  (Lwoff,  1953).  Prophage  induction  has  since  been  repeatedly 
demonstrated  in  natural  microbial  isolates  and  recently  identified  in  most  microbial  genomes 
(reviewed  in  (Casjens,  2003)).  Prophage  are  so  common  that  they  often  account  for  a  significant 
fraction  of  the  “strain-specific”  DNA  between  closely  related  microbial  strains  (Baba  et  al.,  2002; 
Simpson  et  al.,  2000;  Smoot  et  al.,  2002)  and,  furthermore,  their  genes  are  highly  expressed  as 
seen  using  genome-wide  expression  arrays  under  varying  conditions  (Smoot  et  al.,  2001; 

Whiteley  et  al.,  2001). 

Together,  these  findings  emphasize  that  prophage  are  not  only  widespread  in  prokaryotes, 
but  also  frequently  account  for  strain  diversification  at  both  the  genome  and  transcriptome 
(expression)  levels  often  altering  the  host  cell’s  physiology.  However,  the  genomes  of  currently 
available  freshwater  cyanobacterial  genomes  lack  intact  prophage  (Canchaya  et  al.,  2003; 

Casjens,  2003)  and  no  direct  observation  has  confirmed  the  existence  of  prophage  in  marine 
cyanobacteria.  We  examined  the  genome  sequences  of  3  marine  cyanobacteria  ( Prochlorococcus 
strains  MED4  and  MIT9313  and  Synechococcus  WH8102)  for  the  presence  of  intact  prophage 
(Appendix  B).  Are  there  intact  prophage  genomes  within  marine  cyanobacterial  genomes? 
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If  yes,  what  type  of  phage  do  their  genomes  most  resemble?  If  not,  could  potential 
prophage  have  been  induced  during  cultivation?  Is  there  any  evidence  suggesting  prophage 
ever  integrated  into  these  genomes? 

Cvanophaee  eenomics 

Phage  genomes  represent  the  largest  unexplored  reservoir  of  sequence  information  in  the 
biosphere  (Pedulla  et  al.,  2003).  Calculations  using  two  independent  methods  suggest  that  less 
than  0.0002%  of  this  reservoir  has  been  sampled  (Rohwer,  2003).  First,  using  the  estimated 
number  of  phage  types  (100  million),  an  average  genome  size  of  50  ORFs  and  extrapolation  from 
the  number  of  phage-encoded  ORFs  from  uncultured  genomic  DNA  sequencing  that  lack  known 
function  suggests  that  2.5  billion  phage-encoded  ORFs  remain  to  be  discovered.  Second,  using 
the  non-parametric  estimator  Chaol  to  evaluate  every  phage-encoded  ORF  compared  against 
every  other,  one  predicts  that  2  billion  phage-encoded  ORFs  remain  to  be  discovered.  In  fact, 
nearly  every  new  phage  genome  sequenced  leads  to  novel  insight  into  the  role  phages  play  in 
their  host  physiology  and  niche  differentiation  (Brussow  and  Hendrix,  2002). 

Since  the  first  phage  was  sequenced  over  a  quarter  of  a  century  ago  (Sanger  et  al.,  1977), 
observations  from  over  150  phage  genomes  (primarily  pathogen  related)  have  taught  us  a  great 
deal.  First,  our  models  of  phage  evolution  have  changed  to  reflect  the  fact  that  phages  are 
modular  and  evolve  through  the  homologous  and  non-homologous  exchange  of  genes  from  a 
common  gene  pool  (Hendrix,  2002;  Hendrix  et  al.,  1999).  Second,  the  genetic  sloppiness  of 
phages,  which  is  compensated  by  their  high  numbers  of  progeny,  allows  the  acquisition  of  “non¬ 
phage”  genes  that  most  often  will  decrease  phage  fitness  and  be  lost  from  the  phage  population, 
but  occasionally  provides  advantage  to  the  phage  and/or  host  and  thus  will  become  fixed  in  the 
phage  population  (Hendrix  et  al.,  2000).  These  events,  though  rare,  can  create  entirely  new 
phage-host  evolutionary  dynamics  even  defining  new  host  capabilities,  and  thus  are  vital  to  the 
physiology,  ecology  and  evolution  of  both  the  phage  and  its  host. 

Marine  phage  genomics  is  in  its  fledgling  stage,  with,  at  the  time  of  this  writing,  only  a 
handful  of  published  genomes  (Paul  et  al.,  2002).  Currently,  there  are  over  170  phage  genomes  in 
GenBank,  but  only  seven  infect  marine  hosts  (cyanophage  P60;  vibriophages  VpV262,  KVP40, 
VP16T,  VP16C;  roseophage  SIOl;  Pseudoalteromonas  phage  PM2),  and  only  one  is  a 
cyanophage  (P60).  However,  genomic  analyses  have  already  provided  tantalizing  hints  that,  just 
as  phages  are  connected  to  their  host’s  ecology  in  phagen  phage-host  systems,  so  too  marine 
phages  appear  intimately  tied  to  the  ecological  setting  and  physiological  state  of  their  host.  For 
example,  phosphate  is  a  commonly  limiting  nutrient  in  marine  systems  (Cavender-Bares,  Karl, 
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and  Chisholm,  2001;  Wu  et  al.,  2000)  and  the  discovery  of  phosphate-inducible  genes  in 
Roseophage  SIOl  (Rohwer  et  al.,  2000)  and  Vibriophage  KVP40  (Miller  et  al.,  2003)  suggests 
that  these  phages  respond  to  the  phosphate  status  of  their  hosts  and/or  environment.  Data  in  this 
thesis  from  analyses  of  three  cyanophage  genomes  also  supports  a  strong  connection  between  the 
phage  and  their  host’s  ecology  (Chapter  4,  5).  Through  collaboration  with  Forest  Rohwer  (San 
Diego  State  University),  David  Mead  (Lucigen)  and  the  Department  of  Energy  Joint  Genome 
Institute  we  sequenced  the  genomes  of  three  Prochlorococcus  cyanophage  isolates  representing 
two  viral  families  ( Podoviridae  and  Myoviridae).  Will  the  gene  complements  of  these 
cyanophages  isolated  from  nutrient-limited  waters  provide  insight  into  properties  of  these 
phages  that  are  unique  to  the  oligotrophic  ocean  environment  of  their  hosts?  Recently,  core 
photosynthetic  genes  were  discovered  in  a  Synechococcus  cyanophage  and  it  was  hypothesized 
that  these  genes  were  expressed  enabling  photosynthesis  to  continue  during  infection  (Mann  et 
al.,  2003).  Do  the  gene  complements  of  these  Prochlorococcus  cyanophage  also 
photosynthetic  genes?  Can  we  use  the  genetic  information  in  these  genomes  to  infer 
whether  phage  influence  the  genomes  of  their  hosts?  Vice  versa? 

Comparative  phage  genomics  suggests  that  dsDNA  phages  evolve  through  the  exchange 
of  genetic  material,  in  the  form  of  modular  functional  cassettes,  through  a  global  phage  genome 
pool  (Hendrix  et  al.,  1999).  To  explain  the  observation  that  phage  genomes  appear  to  be  mosaics, 
containing  a  large  number  of  fixed,  essential  genes  interspersed  with  highly  variable,  non- 
essential  genes  (Desplats  and  Krisch,  2003;  Hendrix  et  al.,  1999;  Molineux,  in  press),  it  has  been 
suggested  that  access  to  this  global  phage  genome  pool  must  be  limited  in  some  manner  (Hendrix 
et  al.,  1999).  This  theory  that  phage  evolve  primarily  through  the  horizontal  exchange  of  genes 
appears  to  be  applicable  to  the  few  phage  types  (which  are  predominantly  temperate  phages,  i.e., 
capable  of  integrating  into  their  host  genomes)  whose  genomes  are  well  represented  among  the 
phage  genome  databases.  These  include  such  phages  as  the  lambdoid  phages  (capable  of 
genetically  recombining  with  coliphage  lambda)  (Hendrix  et  al.,  1999),  dairy  phages  (phages  that 
infect  lactic  acid  bacteria)  (Brussow,  2001)  and  the  mycobacteriophages  (phages  that  infect 
mycobacteria)  (Pedulla  et  al.,  2003).  However,  fundamental  differences  in  phage  infection 
strategies  (i.e.,  temperate  phages  integrate  into  host  genomes  wheras  lytic  phages  do  not)  may 
lead  to  quantitatively  different  opportunities  for  horizontal  gene  exchange  and  it  remains  an  open 
question  whether  lytic  phage  genomes  might  be  less  prone  to  mosaicism  (Kovalyova  and 
Kropinski,  2003).  Further,  if  all  phages  are  extensively  horizontally  transferring  genes,  this  leads 
one  to  wonder  whether  significantly  new  phage  types  will  be  discovered  as  more  phage  genomes 
are  sequenced  or  whether  there  might  be  a  limited  number  of  phage  types  are  possible.  Because 
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cyanobacterial  phages  are  underrepresented  in  the  phage  genome  database  (only  1  cyanophage 
out  of  over  170  phage  genomes),  the  Prochlorococcus  cyanophage  genomes  sequenced  during 
this  thesis  offer  comparison  of  a  phage  that  infects  phylogenetically  disparate  hosts  from  those 
contained  in  the  database.  How  do  these  new  Prochlorococcus  phage  genomes  compare  to 
those  already  sequenced?  Are  their  genomes  organized  similarly  to  well-characterized 
phages?  What  kinds  of  inferences  can  we  draw  about  phage-host  interactions  from  the 
presence  or  absence  of  4non-phage-like’  genes  in  these  phage  genomes? 
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Prochlorococcus  is  the  numerically  dominant  phototroph  in  the 
tropical  and  subtropical  oceans,  accounting  for  half  of  the 
photosynthctic  biomass  in  some  areas1,2.  Here  we  report  the 
isolation  of  cyanophages  that  infect  Prochlorococcus ,  and  show 
that  although  some  are  host-strain-specific,  others  cross-infect 
with  closely  related  marine  Synechococcus  as  well  as  between 
high-light-  and  low-light-adapted  Prochlorococcus  isolates, 
suggesting  a  mechanism  for  horizontal  gene  transfer.  High¬ 
light-adapted  Prochlorococcus  hosts  yielded  Podoviridae  exclu¬ 
sively,  which  were  extremely  host-specific,  whereas  low-light- 
adapted  Prochlorococcus  and  all  strains  of  Synechococcus  yielded 
primarily  Myoviridae ,  which  has  a  broad  host  range.  Finally,  both 
Prochlorococcus  and  Synechococctis  strain-specific  cyanophage 
titres  were  low  (<103ml  !)  in  stratified  oligotrophic  waters 
even  where  total  cyanobacterial  abundances  were  high  (>105 
cells  ml  *).  These  low  titres  in  areas  of  high  total  host  cell 
abundance  seem  to  be  a  feature  of  open  ocean  ecosystems.  We 
hypothesize  that  gradients  in  cyanobacterial  population  diver- 
sity,  growth  rates,  and/or  the  incidence  of  lysogeny  underlie  these 
trends. 

Phages  are  thought  to  evolve  by  the  exchange  of  genes  drawn  from 
a  common  gene  pool  through  differential  access  imposed  by  host 
range  limitations3.  Similarly,  horizontal  gene  transfer,  important  in 
microbial  evolution4,3,  can  be  mediated  by  phages*  and  is  probably 
responsible  for  many  of  the  differences  in  the  genomes  of  closely 
related  microbes'.  Recent  detailed  analyses  of  molecular  phylogenies 
constructed  for  marine  Prochlorococcus  and  Synechococcus1*  (Fig.  1) 
show  that  these  genera  form  a  single  group  within  the  marine 
picophytoplankton  cladev  (>96%  identity  in  16S  ribosomal  DNA 
sequences),  yet  display  microdiversity  in  the  form  of  ten  well-defined 
subgroups*.  We  have  used  members  of  these  two  groups  to  study 
whether  phage  isolated  on  a  particular  host  strain  cross-infect  other 
hosts,  and  if  so,  whether  the  probability  of  cross-infection  is  related 
to  rDNA- based  evolutionary  distance  between  the  hosts. 


Analyses  of  host  range  were  conducted  (Fig.  1)  with  44  cyano¬ 
phages,  isolated  as  previously  described10  from  a  variety  of  water 
depths  and  locations  (see  Supplementary  Information)  using  20 
different  host  strains  chosen  to  represent  the  genetic  diversity  of 
Prochlorococcus  and  Synechococcus *.  Although  we  did  not  examine 
how  these  patterns  would  change  if  phage  were  propagated  on 
different  hosts,  this  would  undoubtedly  add  another  layer  of 
complexity  due  to  host  range  modifications  as  a  result  of  methyl- 
ation  of  phage  DNA6.  Similar  to  those  that  infect  other  marine 
bacteria11  and  Synechococcusll>~lA,  our  Prochlorococcus  cyanophage 
isolates  fell  into  three  morphological  families:  Myoviridae ,  Sipho- 
viridae  and  Podoviridae ,5. 

As  would  be  predicted l0_u,  Podoviridae  were  extremely  host 
specific  with  only  two  cross-infections  out  of  a  possible  300 
(Fig.  1).  Similarly,  the  two  Siphoviridae  isolated  were  specific  to 
their  hosts.  In  instances  of  extreme  host  specificity,  in  situ  host 
abundance  would  need  to  be  high  enough  to  facilitate  phage-host 
contact.  It  is  noteworthy  in  this  regard  that  members  of  the  high¬ 
light-adapted  Prochlorococcus  cluster,  which  yielded  the  most  host- 
specific  cyanophage,  have  high  relative  abundances  in  situ ,6.  The 
Myoviridae  exhibited  much  broader  host  ranges,  with  102  cross¬ 
infections  out  of  a  possible  539.  They  not  only  cross-infected  among 
and  between  Prochlorococcus  ecotypes  but  also  between  Prochloro 
coccus  and  Synechococcus.  Those  isolated  with  Synechococcus  host 
strains  have  broader  host  ranges  and  are  more  likely  to  cross-infect 
low-light-adapted  than  high-light-adapted  Prochlorococcus  strains. 
The  low-light-adapted  Prochlorococcus  are  less  diverged  from  Syne¬ 
chococcus  than  high-light-adapted  Prochlorococcus 7J,  suggesting  a 
relationship,  in  this  instance,  between  the  probability  of  cross¬ 
infection  and  rDNA  relatcdness  of  hosts.  Finally,  we  tested  the 
Myoviridae  for  cross-infection  against  marine  bacterial  isolates 
closely  related  to  Pseudoalteromonas,  which  are  known  to  be  broadly 
susceptible  to  diverse  bacteriophages  (bacterial  strains  HER  1320, 
HER  1321,  HER  1327,  HER1328)11.  None  of  the  Myoviridae  cyano¬ 
phages  infected  these  bacteria. 

Phage  morphotypes  isolated  were  determined,  to  some  degree,  by 
the  host  used  for  isolation  (Fig.  1).  For  example,  ten  of  ten 
cyanophages  isolated  using  high -light-adapted  Prochlorococcus 
strains  were  Podoviridae.  In  contrast,  all  but  two  cyanophages 
isolated  on  Synechococcus  were  Myoviridae ,  a  bias  that  has  been 
reported  by  others14,  and  over  half  of  those  isolated  on  low-light- 
adapted  Prochlorococcus  belonged  to  this  morphotype.  We  further 
substantiated  these  trends  by  examining  lysates  (as  opposed  to 
plaque-purified  isolates)  from  a  range  of  host  strains,  geographic 
locations  and  depths— of  58  Synechococcus  lysates  93%  contained 
Myoviridae ,  of  43  low-light-adapted  Prochlorococcus  lysates  65% 
contained  Myoviridae ,  and  of  107  high  light-adapted  Prochloro¬ 
coccus  lysates  98%  contained  Podoviridae  (see  Supplementary 
Information). 

Maximum  cyanophage  titres,  using  a  variety  of  Synechococcus 
hosts,  are  usually  found  to  be  within  an  order  of  magnitude  of  the 
total  Synechococcus  abundance,0,,4,l7‘"\  and  can  be  as  high  as  106 
phage  ml  '.  One  study17  has  shown,  for  example,  that  along  a 
transect  in  which  total  Synechococcus  abundance  decreased  from 

10  cells  ml  1  to  250  cells  ml  ',  maximum  cyanophage  titres 
remained  at  least  as  high  as  the  total  number  of  Synechococcus. 
We  wondered  whether  titres  of  Prochlorococcus  cyanophage  in  the 
Sargasso  Sea,  where  Prochlorococcus  cells  are  abundant  (105 
cells  ml  ),  would  be  comparable  to  those  measured  in  coastal 
oceans  for  Synechococcus  where  total  Synechococcus  host  abundances 
are  of  similar  magnitude.  We  assayed  cyanophage  titres  in  a  depth 
profile  in  the  Sargasso  Sea  at  the  end  of  seasonal  stratification  using 

11  strains  of  Prochlorococcus  (Fig.  2),  choosing  at  least  one  host 
strain  from  each  of  the  six  phylogenetic  clusters  that  span  the 
rDNA-based  genetic  diversity  of  our  culture  collection*. 

Three  Prochlorococcus  host  strains  (MIT  9303,  MIT  9313  and 
SSI 20)  yielded  low  or  no  cyanophage.  Other  hosts  yielded  titres  I 
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that  reached  a  maximum  at  70  m  (NATL2A-phage)  or  100  m 
(MIT  9302-,  MIT  9515-,  MED4-,  MIT  9211-,  NATLIA-phage) 
near  the  depth  of  maximum  Prochlorococcus  abundance  (Fig.  2). 
All  Prochlorococcus  cyanophage  titres  were  low  (<350 
cyanophage  ml  *)  compared  with  those  reported  for  Synechococcus 
in  coastal  regions  (approximately  104— 106  cyanophage  ml-1)  even 
though  total  host  abundances  were  similar  between  these  regions 
(approximately  105  cells  ml  1),0*,4-,7-,8>  Prochlorococcus  cyanophage 
titres  are  comparable  to  those  of  Synechococcus  from  oligotrophic 
waters  in  the  Gulf  of  Mexico— but  in  that  instance  the  total 
Synechococcus  abundance  was  also  low  (<250  cells  ml-1)17. 

Cyanophage  titres  were  also  examined  along  a  surface  water 
transect  from  coastal  (mesotrophic)  to  open  ocean  (oligotrophic) 
in  the  Atlantic  Ocean  to  better  understand  the  relationship  between 
maximum  phage  titre  and  total  host  abundance  along  a  trophic 
gradient.  Titres  were  assayed  with  12  strains  of  Synechococcus  and 
Prochlorococcus  that  represented  the  known  rDNA-based  genetic 
diversity  at  the  time  that  we  began  the  study4  (but  see  also  ref.  19). 
We  found  that  Synechococcus  cyanophage  titres  decreased  by  an 
order  of  magnitude  or  greater  in  surface  waters  between  the  coastal 
and  open  ocean  (Sargasso)  sites,  whereas  total  Synechococcus  abun¬ 
dance  decreased  from  3  X  104  to  7  X  103  cells  ml-1  (Fig.  3).  Pro¬ 
chlorococcus  hosts  did  not  yield  cyanophage  in  coastal  samples 
where  there  are  no  Prochlorococcus  cells,  and  yielded  relatively  low 


titres  (0  to  1.5  X  103  phage  ml  *)  at  the  shelf,  slope  and  Sargasso 
stations  where  total  Prochlorococcus  abundance  was  between 
4.5  X  104  and  1.4  X  105  cells  ml  \  Even  though  total  Prochlorococ¬ 
cus  abundance  at  the  Sargasso  site  was  similar  to  that  of  Synecho¬ 
coccus  at  the  coastal  site  (Fig.  3i,  j),  Prochlorococcus  and 
Synechococcus  cyanophage  titres  were  significantly  lower  at  the 
open  ocean  site  (Fig.  3a-h).  Moreover,  regardless  of  the  host  used, 
titres  never  exceeded  3  X  103  cyanophage  ml" 1  at  any  depth 
throughout  the  photic  zone  even  though  total  Prochlorococcus 
abundances  exceeded  105  cells  ml-1  (see  Supplementary  Infor¬ 
mation).  Thus  it  seems  that  cyanophage  titres  at  the  end  of  summer 
stratification  are  relatively  low  in  open  ocean  ecosystems,  where  the 
total  possible  host  cell  abundances  are  relatively  high.  Low  titres  lead 
to  reduced  contact  rates  and  lowered  mortality  rates6. 

Although  it  is  difficult  to  draw  definitive  conclusions  about 
causality  from  such  trends  because  of  the  complexity  of  the 
phage-host  interaction,  there  are  some  factors  that  might  be 
implicated.  If,  for  example,  host  strain  microdiversity  increased 
along  the  transect  and  cross- infection  ability  did  not  increase 
concurrently,  this  would  lead  to  lower  phage  titres  yielded  by  a 
suite  of  host  strains20.  Indeed,  we  know  that  the  relative  abundance 
of  Synechococcus  ecotypes  changes  from  coastal  to  oligotrophic 
waters16.  However,  we  observed  a  systematic  decrease  in  cyanophage 
titres  for  all  five  Synechococcus  hosts  (note  WH  8020  yielded  no 
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figure  1  Host  ranges  of  44  clonal  cyan  oph  ages  exposed  to  marine  Prochlorococcus  and 
Synechococcus  cultured  isolates.  The  evolutionary  relationships  between  the  21  host 
strains  are  shown  in  the  phylogenetic  tree  inferred  using  1 6S-23S  rONA  spacer  regions®. 
For  the  cyanophages.  red  indicates  Podoviridae,  blue  indicates  Myoviridae  and  green 
indicates  Siphoviridae.  Filled  circles,  host  strain  used  to  isolate  a  particular  cyanophage; 


open  circles,  cross-infection  of  cyanophage  with  another  host;  dash,  no  infection  (that  is, 
lysis).  Symbols  in  parentheses  indicate  that  the  results  do  not  match  earlier  studies'013 
with  these  phage  and  hosts  (see  Methods  for  details).  HL  Pros,  high-light-adapted 
Prochlorococcus,  11  Pros,  low-light-adapted  Prochlorococcus,  Marine  Syns,  marine 
Synechococcus. 
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Figure  2  Cyanophage  litres,  measured  using  Prochlorococcus  bast  strains,  as  a  function 
of  depth  at  the  Bermuda  Atlantic  Time  Series  Station  in  the  Sargasso  Sea  on  26 
September  1999  Nine  of  the  cyanophages  used  in  host  range  analyses  were  isolated 
from  this  depth  profile  (see  Supplementary  Information),  a-c,  Titres  measured  using 
high-light-adapted  Prochlorococcus  hosts  (a)  and  low-light-adapted  hosts  (b).  and  total 


Prochlorococcus  and  Synechococcus  cell  abundances  ( Prochlorococcus  cells 
x  104  mT Synechococcus  cells  x  1 03  ml"  ’)  and  a ,  (a  proxy  for  water  density  used  to 
measure  the  depth  of  the  mixed  layer)  (c).  Titres  were  undetectable  tor  low-light-adapted 
Prochlorococcus  strains  SSI  20,  MIT9303  and  MIT931 3.  Error  bars  represent  the  s.d.  of 
assays. 


Figure  3  Cyanophage  titres  measured  in  Synechococcus  arti  Prochlorococcus  host  cells 
along  a  surface  water  transect  from  coastal  (coast,  Woods  Hole,  Massachusetts)  to  open 
ocean  (Sargasso)  conducted  in  September  2001 .  Note  that  the  magnitudes  of  the  /  axes 
are  different  for  a-e  and  f-j  Ten  of  the  cyanophages  used  in  host  range  analyses  were 
isolated  from  along  this  transect  (see  Supplementary  Information),  a-h,  Cyanophage 


titres  represent  the  averages  and  s.d.  of  triplicate  plaque  assays,  i,  j,  Cell  concentrations 
represent  averages  and  s.d.  of  duplicate  flow  cytometry  assays  Where  no  bar  is  shown, 
there  were  no  plaques  (a.  c.  e-ti)  or  no  cells  (i).  No  plaques  were  observed  at  any  of  the 
surface  samples  along  the  transect  for  Synechococcus  strain  WH  8020  and 
Prochlorococcus  strains  MIT  9313,  SSI  20  and  MIT  921 1  (data  not  shown) 
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plaques)  at  the  extremes  of  this  transect  (the  coastal  and  Sargasso 
sites;  Fig.  3a-e).  If  changing  host  abundance  alone  explained  the 
change  in  titres,  and  if  our  suite  of  host  strains  is  representative  of 
natural  diversity,  then  one  might  expect  at  least  one  host  strain  to 
yield  increasing  titres  along  the  gradient  (for  example,  for  the  ‘open 
ocean  strain’  WH  8102);  however,  this  was  not  observed. 

Another  possible  explanation  for  decreasing  phage  titres  as  one 
goes  from  coastal  to  open  ocean  ecosystems  is  decreased  nutrient 
availability  along  the  transect* 1 11  resulting  in  suboptimal  growth  of 
host  cells  in  the  Sargasso  Sea22  relative  to  Synechococcus  at  the  coastal 
site23.  Viral  production  is  correlated  with  host  growth  rates  in 
chcmostats24  and  in  the  field25,  which  could  result  from  nutrient 
limitation  causing  physiological  changes  in  the  host  that  stall  the 
lytic  process  of  obligately  lytic  phage*,  or  favour  lysogeny  in 
temperate  phage1**26,27.  Although  temperate  phage  have  not  been 
identified  for  marine  Prochlorococcus  or  Synechococcus,  INT  family 
site-specific  recombinases  exist  in  the  genomes  of  Prochlorococciis 
MED4  and  MIT  9313,  and  Synechococcus  WH  8102  (http:// 
www.jgi.doe.gov/JGI_microbial/html/index.html),  suggesting  that 
prophages  were  once  integrated  into  these  host  genomes2*-2’'. 

The  phage-host  system  described  here  should  continue  to  be  a 
useful  framework  for  advancing  our  understanding  of  the  ecology 
and  evolution  of  phage-host  interactions  in  marine  ecosystems.  We 
have  known  for  some  time  that  cyanophages  must  have  a  role  in 
maintaining  genetic  diversity  among  hosts10,17.  The  broad  host 
ranges  reported  here  indicate  further  their  potential  for  mediating 
horizontal  gene  transfer,  which  may  help  explain  the  extensive 
microdivcrsity4,,v,2a,2v  seen  in  these  two  groups  of  marine  cyanobac¬ 
teria.  The  extent  to  which  this  potential  is  realized  should  become 
clear  as  more  and  more  host  and  phage  genomes  arc  sequenced.  Of 
significance  also  is  the  coupling  between  phage  morphology  and 
host  type.  Experiments  designed  to  characterize  phage  resistance 
across  variable  hosts  and  phages  (for  example,  identification  of 
receptors  and  restriction  and  modification  systems)  should  eluci¬ 
date  the  underlying  mechanisms  responsible  for  these  patterns. 
Finally,  our  analyses  of  cyanophagc  titres  along  a  coastal-open 
ocean  transect  suggest  that  the  underlying  processes  responsible  for 
the  production  of  free  cyanophagc  differ  along  trophic  gradients  in 
the  oceans.  To  fully  explain  these  observations  will  require  the 
development  of  approaches  that  allow  one  to  determine  which 
phage  can  infect  which  host(s)  in  a  given  community,  and  an 
understanding  of  the  relative  roles  of  lytic  and  lysogenic  phases  of 
the  viral  life  cycle  in  aquatic  systems.  □ 

Methods 

Sample  collection 

Wafer  samples  for  cyanophagc  titres,  cyanophagc  isolations  and  cyanobacterial 
abundances  were  collected  at  the  Bermuda  Atlantic  Time  Scries  Station  on  26  September 
1999  and  at  four  sites  along  a  transect  from  Woods  Hole  to  the  Western  Sargasso  Sea  on  5, 
16,  17  and  22  September  2001  (see  Supplementary  Information).  Water  for  cyanophagc 
isolations  was  filtered  (0.4  jcm.  Poretics  number  13028  in  1999;  0.2  pm,  Osmonics  number 
K02CP04700  in  2001 )  and  stored  at  4  “C  in  the  dark  in  acid-washed  polycarbonate  ( 1999) 
or  glass  (2001)  bottles  until  analysis  (up  to  15  months  later).  Cyanophagc  titres  remain 
stable  for  at  least  one  year7.  Control  experiments  showed  that  titres  were  stable  over  a 
IS-month  period  (see  Supplementary  Information). 

Culturing  conditions 

Prochlorococciis  and  Synechococcus  strains  were  maintained  in  '75%  Pro99’  medium,  a 
modification  of  the  ‘Pro2‘  medium  w  with  a  73%  seawater  base  and  the  following  final 
concentrations  of  N  and  P;  800  nM  NH^O,  50  pM  NaHjPOv  Cultures  were  grown  at 
19-21  °C  under  constant  light  8-12  |*E  m  *' 1  for  low-light-adapted  Prochlorococciis  and 
Synechococcus-,  35-45  pE  m  2  s  ‘  for  high -light -adapted  Prochlorococcus. 

Cyanophage  isolations 

Prochlorococcus  cyanophagc  isolations  were  done  initially  using  an  axenk  strain  of 
Prochlorococcus  (MED4ax;  M.  Saito  and  J.B.W..  unpublished  observations).  Exponentially 
growing  cells  were  transferred  to  fresh  medium  ( I  ml:20  ml!  and  inoculated  with  I  ml  of 
0.4-M.m-filtcrcd  sea  water.  The  time  course  of  auto-fluorescence  (chlorophyll  biomass)  of 
these  cultures  was  then  followed  with  a  Turner  Designs  10- AU  fluorometer.  Cultures 
showing  reduced  fluorescence  relative  to  controls  were  filtered  and  examined  for  phage 
panicles  as  previously  described,,,-,\  Lysates  were  stored  at  4°C  in  the  dark.  Subsequent 


isolations  using  19  additional  host  strains  were  done  using  the  same  procedures  seated 
down  to  small  volumes.  Cyanophagc  isolates  used  in  this  study  were  plaque- purified  twice 
before  use,  classified  using  morphology  described  by  the  1CTV'\  and  named  according  to 
suggestions  made  for  cyanophage1’. 

Cyanophage  host  range 

Host  range  analyses  were  conducted  over  a  period  of  about  2  yr  Each  interaction  between 
a  cyanophage  and  its  potential  host  cell  was  performed  with  exponentially  growing  cells  in 
triplicate  on  at  least  two  different  occasions.  Marine  bacterial  strains  were  purchased  from 
the  Felix  d’Herelle  Reference  Center  for  Bacterial  Viruses  (contact  H.  Ackermann).  Several 
of  the  Synechococcus  cyanophagc  used  in  this  study  (*Syn'  phages;  S-PM2  and  S-WHMI ) 
had  been  previously  examined  for  host  range  cross-infectivity,c  ‘ '  and  were  maintained  as 
lysates  at  4  ®C  in  the  dark  while  host  cyanobacterial  cultures  were  serially  transferred  in  late 
exponential  and  early  stationary  phase.  A  total  of  103  of  108  cross-infections  using  these 
stored  evanophages  yielded  similar  host  range  results  in  this  study  (Fig.  I).  Of  the 
differences  observed,  four  of  five  were  for  one  cyanophage  isolate  ( Syn  1 0).  suggesting  that 
it  might  have  evolved  an  extended  host  range  mutation.  Host  range  can  be  altered  through 
DNA  modifications  that  can  occur  during  propagation  of  a  phage  on  an  alternative  host. 
Overall,  these  results  suggest  that  cyanophage  susceptibility  of  these  host  strains  and  the 
cross-infcctivity  of  the  cyanophagc  remained  relatively  stable  throughout  the  10  or  more 
yean  of  storage  and  culture  maintenance. 

Host  cell  and  cyanophage  quantification 

Host  cell  abundance  was  measured  using  a  modified  Becton-Dickinson  FACScan  flow 
cytometer7.  Cyanophage  titres  were  quantified  using  most  probable  number  (M  PN )  assays 
( 1999)  or  plaque  assay  (2001 ).  MPN  assays  were  monitored  for  lysis  relative  to  controls  for 
2-3  weeks  depending  on  the  host  strain  used.  For  the  plaque  assays,  we  plated  the  host 
strain  in  soft  agarose  (0.4%  final  concentration;  GIBCO  BRL,  Life  Technologies  number 
5517-014)  along  with  the  phage  being  ritred;  lawns  of  cdls  appeared  8-28  days  after 
inoculation,  depending  on  the  strain  (M.  Saito.  M.B.S.  and  unpublished  data). 

Plaques  were  counted  daily  until  they  no  longer  appeared  (3-14  days  after  the  first 
plaques).  Titres  measured  using  both  assays  were  not  significantly  different  (r-test 
assuming  equal  variances,  a  =  0.10). 

Host  dependency  of  measured  titres 

To  see  whether  our  standard  suite  of  host  cells  used  in  our  assays  was  giving  us  a 
representative  picture  of  the  maximum  phage  litre  obtainable  in  a  given  sample,  we  used 
every  cultured  isolate  of  Prochlorococcus  in  our  collection  with  a  unique  ITS  rDNA 
sequence  (23  isolates)  to  assay  the  evanophage  litre  of  a  50-m  water  sample  from  the  Red 
Sea.  The  range  of  ritres  yielded  was  representative  of  the  range  wc  measured  using  our 
subset  of  Prochlorococcus  isolates  (see  Supplementary  Information). 
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Low-light-adapted  Prochlorococcus 
species  possess  specific  antennae 
for  each  photosystem 
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Prochlorococcus ,  the  most  abundant  genus  of  photosynthetic 
organisms',  owes  its  remarkably  large  depth  distribution  in  the 
oceans  to  the  occurrence  of  distinct  genotypes  adapted  to  cither 
low-  or  high-light  niches2*3.  The  peb  genes,  encoding  the  major 
chlorophyll-binding,  light-harvesting  antenna  proteins  in  this 
genus*,  are  present  in  multiple  copies  in  low-light  strains  but  as  a 
single  copy  in  high-light  strains5.  The  basis  of  this  differen¬ 
tiation,  however,  has  remained  obscure.  Here  we  show  that  the 
moderate  low-light-adapted  strain  Prochlorococcus  sp.  MIT  9313 
has  one  iron-stress-induced  peb  gene  encoding  an  antenna 
protein  serving  photosystem  I  (PSl) — comparable  to  isiA  genes 
from  cyanobacteria*7— -and  a  constitutively  expressed  peb  gene 
encoding  a  photosystem  II  (PSII)  antenna  protein.  By  compari¬ 
son,  the  very  low-light-adapted  strain  SSI 20  has  seven  peb  genes 


encoding  constitutive  PSI  and  PSII  antennae,  plus  one  PSI  iron- 
regulated  peb  gene,  whereas  the  high-light-adapted  strain  MED4 
has  only  a  constitutive  PSII  antenna.  Thus,  it  seems  that  the 
adaptation  of  Prochlorococcus  to  low  light  environments  has 
triggered  a  multiplication  and  specialization  of  Peb  proteins 
comparable  to  that  found  for  Cab  proteins  in  plants  and  green 
algae". 

In  order  to  gain  a  better  understanding  of  the  origin,  function 
and  localization  of  the  divinyl -chlorophyll  alb- binding  antenna 
complexes  of  Prochlorococcus  species  and  how  these  properties  relate 
to  the  light  niche  to  which  different  strains  are  adapted,  we  have 
undertaken  gene  expression  and  structural  studies  on  the  moderate 
low-light-adaptcd  strain  MIT  9313  for  which  the  full  genome 
sequence  is  available  (http://www.jgi.doc.gov/JGI_microbial/html/). 
This  strain  contains  two  peb  genes:  pcbA  (PMT  1046)  and  pcbB 
(PMT  0496).  This  contrasts  with  the  very  low-light-adapted  strain 
SSI 20  that  contains  eight  peb  genes  {peb A  to  pcbH) — analysis  of  its 
genome  sequence  (http://www.sb-roscoff.fr/Phyto/ProSSl20/) 
revealed  the  presence  of  one  more  peb  gene  ( pcbH)  than  previously 
thought5— and  the  high-light-adapted  strain  MED4,  which  has  only 
a  single  peb  gene  (pcM)°. 

Recently  it  was  shown  by  electron  microscopy  and  single-particle 
analyses  that  SSI 20  contains  a  giant  supercomplcx  consisting  of  the 
PSI  reaction  centre  trimer  surrounded  by  a  light-harvesting  antenna 
ring  composed  of  18  Peb  subunits’,  similar  to  the  18-mer  IsiA-PSl 
supercomplex  induced  in  cyanobacteria  when  deprived  of  iron**7. 
However  studies  on  MIT  9313  grown  under  similar  conditions  to 
SSI 20  did  not  reveal  the  presence  of  an  18-mer  Pcb-PSI  super- 
complex  but  only  ‘naked’  trimeric  PSI  complexes,  which  matched 
with  the  cyanobactcrial  X-ray  structure10  (Fig.  la,  b).  Instead, 
electron  microscopy  (Figs  lc,  d  and  2a)  indicated  that  Peb  proteins 
associate  with  the  dimeric  reaction  centre  complex  of  PSII  to  form  a 
Pcb-PSI  1  supercomplex  having  dimensions  of  approximately 
210  x  290  A.  Our  interpretation  is  that  this  Pcb-PSII  supercomplex 
consists  of  eight  Peb  subunits  with  four  distributed  on  each  side  of 
the  PSII  dimer  as  shown  in  Fig.  lc.  This  is  emphasized  by  overlaying 
onto  the  projection  map  the  published  X-ray-derived  models  of  the 
PSII  reaction  centre  dimer  and  CP43,  a  PSII  antenna  protein 
structurally  similar  to  Pcb9*"  (Fig.  Id).  In  some  cases,  the  four 
Peb  subunits  on  one  side  of  the  dimer  were  missing  ( Fig.  le,  0-  The 
‘naked’  PSI  trimers  and  Pcb-PSII  complexes  shown  in  Fig.  1  were 
located  in  a  chlorophyll-containing  band  (band  2  in  Fig.  2b,  insert 
-FFe)  obtained  by  sucrose  density  centrifugation  after  solubilizing  j 
isolated  thylakoid  membranes  with  the  detergent  $-D-dodecyl 
maltosidc.  Also  contained  in  this  band  were  some  PSII  reaction 
centre  dimers  free  of  Pcb  proteins  (Fig.  lg,  h).  Analysis  of  all 
discernible  particles,  taken  from  band  2  sample  micrographs, 
resulted  in  1,192  particles  assigned  to  PSI  and  PSII.  and  gave  a 
PS1:PSII  ratio  of  about  2.  We  assume  this  to  be  indicative  of  the  ratio 
in  the  intact  thylakoid  membrane  given  that  most  PSI  and  PSII 
particles  were  in  band  2.  Amino-terminal  sequencing  of  the  Pcb 
protein  in  band  2  from  4-Fc  conditions  showed  it  to  be  the  product  | 
of  the  pcbA  gene  only,  a  result  which  was  also  found  for  the  frec-Pcb 
proteins  in  band  1  and  for  Pcb  protein  in  thylakoid  membranes 
(Fig.  3). 

The  absence  of  the  PcbB  protein  and  the  18-mer  Pcb-PSI  super- 
complex  in  iron- replete  MIT  93 13  cells  spurred  us  to  investigate  the 
expression  of  pcbA  and  pcbB  genes  in  this  strain.  When  the  cells  were 
grown  in  medium  supplemented  with  iron  (+Fe),  we  found  that  | 
only  the  pcbA  gene  was  expressed  (Table  1).  However,  when  cells 
were  transferred  to  culture  medium  without  added  iron  (-Fc), 
expression  of  the  pcbB  gene  was  activated,  a  surprising  result  as  pcb 
genes  of  Prochlorococcus  were  not  known  to  be  regulated  by  iron  as  is 
the  iron-stress-induccd  isiB  gene  (Table  1 ).  The  latter  gene  encodes 
flavodoxin12,  which  substitutes  for  ferredoxin  as  an  electron  accep¬ 
tor  to  PSI,  and  its  expression  indicates  that  the  cells  had  acclimatized 
to  conditions  of  iron  depletion.  On  the  other  hand,  the  expression  of 

1051 


NATURE  |  VOL  424 1 2R  AUGUST  200)  |  www.iuuurc.com/nature  ©  2003  Nature  Publishing  Group 


29 


Nature  Submission  #2003-04-03601 A  Supplementary  Information  (Sullivan  et  al.  Cyanophage  Manuscript),  page  1 


Supplementary  Table  1 :  Detailed  information  about  the  Prochlorococcus  and  Synechococcus 
cyanophage  isolates  used  in  the  host-range  analyses  in  this  study.  TEM  Morphology  designations  as 
follows:  “P”  =  Podoviridae ,  “M”  =  Myoviridae ,  “S”  =  Siphoviridae,  “+"  indicates  positive  stained  particles 
Morphometric  data  for  cyanophage  are  approximated  where  possible  from  negatively  stained  (except 


where  indicated  by  ‘+’)  images  without  internal  standards,  “n.a."  suggests  no  tail  was  observed. 


Phage  Name 

Locale 

Latitude  and 

Longitude 

Depth  (m) 

Date  Collected 

TEM 

Morphology 

Head  Diameter 

(nm) 

Tail  Length  x 

Width  (nm) 

Reference 

•SSP1 

BATS 

31048'N,  64°16’W 

100 

6  June  2000 

P+ 

45+ 

n.a. 

This  study 

-RSP1 

Red  Sea 

29°28’N,  34°53’E 

0 

15  July  2000 

P 

55 

6x10 

This  study 

•RSP2 

Red  Sea 

29°28’N,  34°53’E 

0 

15  July  2000 

P+ 

40+ 

n.a. 

This  study 

•SSP2 

BATS 

31°48'N.  64°16*W 

120 

29  Sep  1999 

P+ 

40+ 

n.a. 

This  study 

-SSP3 

BATS 

31°48’N,  64°16’W 

100 

29  Sep  1999 

P 

50 

n.a. 

This  study 

-SSP4 

BATS 

SIMS’N,  64°16'W 

70 

26  Sep  1999 

P 

50 

n.a. 

This  study 

•SSP5 

BATS 

31#48*N,64#16’W 

120 

29  Sep  1999 

P 

55 

6x10 

This  study 

-SSP6 

BATS 

31#48’N,  64°16’W 

100 

26  Sep  1999 

P 

55 

n.a. 

This  study 

-SSP7 

BATS 

31°48’N,  64°16’W 

100 

26  Sep  1999 

P 

50 

n.a. 

This  study 

-GSP1 

Gulf  Stream 

38°21’N,  66°49’W 

40 

6  Oct  1999 

P 

45 

n.a. 

This  study 

-SSP8 

BATS 

31°48’N,  64°16’W 

100 

26  Sep  1999 

P+ 

50 

n.a. 

This  study 

•RSP3 

Red  Sea 

29°28’N,  34°55’E 

50 

13  Sep  2000 

P+ 

50 

n.a. 

This  study 

-SP1 

Slope 

37°40’N.  73°30'W 

83 

17  Sep  2001 

P 

60 

8x12 

This  study 

/n-12 

Gulf  Stream 

36°S8‘N,  73°42'W 

0 

Dec  1990 

P 

45 

8x10 

Waterbury  &  Valois  19 

/n-5 

Sargasso  Sea 

34°06’N,  61*013^ 

0 

July  1990 

P 

Waterbury  &  Valois  19 

-SSM1 

BATS 

3V4VN,  64°16’W 

100 

6  June  2000 

M+ 

60+ 

160x20 

This  study 

-RSM1 

Red  Sea 

29°28’N,  34°53’E 

0 

15  July  2000 

M 

80 

110x27 

This  study 

-ShMI 

Shelf 

SQWN,  71°48’W 

40 

16  Sep  2001 

M 

75 

120x18 

This  study 

-ShM2 

Shelf 

39°60’N,  7 1  °48‘W 

0 

16  Sep  2001 

M+ 

This  study 

-SSM2 

BATS 

31°48'N,  64°16'W 

100 

6  June  2000 

M 

85 

110x25 

This  study 

-SSM3 

BATS 

31°48’N,  64°16’W 

100 

6  June  2000 

M 

95 

105x25 

This  study 

•SSM4 

BATS 

31048'N,  64°16‘W 

10 

6  June  2000 

M 

80 

160x20 

This  study 

-SSM5 

BATS 

31°48’N,  64°16’W 

15 

26  Sep  1999 

M+ 

60+ 

20  x  80+ 

This  study 

-SSM6 

BATS 

31°48'N,  64°16’W 

40 

29  Sep  1999 

M 

90 

150x28 

This  study 

-RSM2 

Red  Sea 

29°28,N.  34°55’E 

50 

13  Sep  2000 

M 

75 

170x23 

This  study 

-RSM3 

Red  Sea 

29°28’N,  34°55'E 

50 

13  Sep  2000 

M 

80 

175x23 

This  study 

-SMI 

Slope 

37°40'N,  73°30*W 

0 

17  Sep  2001 

M 

90 

80+  x  25 

This  study 

-ShMI 

Shelf 

39°60'N,  71°48’W 

0 

16  Sep  2001 

M 

80 

120x25 

This  study 

-SSM1 

Sargasso  Sea 

34°24'N,  72°03’W 

70 

22  Sep  2001 

M 

95 

150x25 

This  study 

/n-2 

Sargasso  Sea 

34°06'N,  61  °01  *W 

0 

July  1990 

M 

66 

149  x  17 

Waterbury  &  Valois  19 

/n-9 

Woods  Hole 

41°31’N,  71  **40^ 

0 

Oct  1990 

M 

87 

153x19 

Waterbury  &  Valois  19 

/n-19 

Sargasso  Sea 

34°06’N,  erOIW 

0 

July  1990 

M 

Waterbury  &  Valois  19 

/n-10 

Gulf  Stream 

73°42*W 

0 

Dec  1990 

M 

100 

145x19 

Waterbury  &  Valois  19 

/n-26 

NE  Providence  Channel 

25°53’N,  77°34’W 

0 

Jan  1992 

M 

Waterbury  &  Valois  19 

/n-30 

NE  Providence  Channel 

25°53'N,  77°34’W 

0 

Jan  1992 

M 

Waterbury  &  Valois  19 

yn-33 

Gulf  Stream 

25°51’N,  79°26'W 

0 

Jan  1995 

M 

Waterbury  &  Valois  19 

•PM2 

English  Channel 

50°18'N,  4°12*W 

0 

23  Sep  1992 

M 

90 

165x20 

Wilson  et  al.  1993 

-WHM1 

Woods  Hole 

4r31,N,71°40,W 

0 

11  Aug  1992 

M 

88 

108x23 

Wilson  et  al.  1993 

/n-1 

Woods  Hole 

41°31’N,  71°40W 

0 

August  1 990 

M 

Waterbury  &  Valois  19 

-ShM2 

Shelf 

39®60'N,  71°48’W 

0 

16  Sep  2001 

M+ 

70+ 

100x30 

This  study 

-SSM2 

Sargasso 

34°24’N,  72°03’W 

0 

22  Sep  2001 

M 

80 

225  x  22 

This  study 

/n-1 4 

Gulf  Stream 

36°58*N,  73°42’W 

0 

Dec  1990 

M 

93 

136x21 

Waterbury  &  Valois  19 

•SSI 

Slope 

37°40’N,  73°30,W 

60 

17  Sep  2001 

S+ 

45+  x  90 

280  x  15 

This  study 

-SS2 

Slope 

37°40’N,  73°30’W 

83 

17  Sep  2001 

S 

50x100 

260  x  12 

This  study 
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□  Siphoviridae 

□  Myoviridae 

□  Podoviridae 


Supplementary  Figure  1 :  Proportion  of  Prochlorococcus  and  Synechococcus  cyanobacterial  lysates  with 
a  given  cyanophage  morphology.  Data  represent  presence  or  absence  of  a  given  morphology  in  each 
lysate  and  do  not  include  observations  of  doubly  plaque  purified  cyanophage  isolates  used  in  the  host 
range  analyses.  The  following  number  of  observations  were  made  for  each  category  from  the  lysates:  HL 
Prochlorococcus  =  107,  LL  Prochlorococcus  =  43,  Synechococcus  =  58. 
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Supplementary  Figure  2:  Cyanophage  titers,  measured  using  Prochlorococcus  and  Synechococcus  hosts 
strains,  as  a  function  of  depth  in  the  oligotrophic  western  Sargasso  Sea  (34°24’N,  72°03,W)  on  22  Sept. 
2001.  (a)  titers  measured  using  Prochlorococcus  hosts,  (b)  titers  measured  using  Synechococcus  hosts, 
(c)  total  Prochlorococcus  and  Synechococcus  cell  abundances  ( Prochlorococcus  cells  x  104  ml*1  and 
Synechococcus  cells  x  103  ml*1)  and  Sigma-T  (at —  a  proxy  for  water  density  used  to  measure  the  depth 
of  the  mixed  layer).  Titers  were  insignificant  for  host  strains  Prochlorococcus  MIT9312,  MIT921 1, 
MIT9313,  SS120  and  Synechococcus  WH6501,  WH8018,  WH8101.  Error  bars  represent  the  standard 
deviation  of  assays. 
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Supplementary  Figure  3:  Water  collected  from  Dyer’s  Dock,  Woods  Hole,  MA  on  3  Oct  2000  was 
immediately  sub-sampled  to  prepare  three  separate  filtrates  to  test  the  effects  of  long  term  storage  on 
cyanophage  titers  using  our  storage  methods.  Cyanophage  titers  were  measured  using  Synechococcus 
WH8012  in  a  MPN  assay  periodically  (3  Oct  2000, 10  Oct  2000,  24  Oct  2000,  28  Nov  2000,  22  Jan  2001, 
28  Dec  2001)  over  the  course  of  15  months.  Data  presented  are  the  average  and  standard  deviation  of 
triplicate  MPN  assays  from  each  sub-sample. 
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Supplementary  Figure  4:  Every  isolate  in  the  MIT  Prochlorococcus  culture  collection  with  a  unique  ITS 
rDNA  sequence  was  used  in  plaque  assays  to  determine  the  strain-specific  cyanophage  titer  from  a  water 
sample  taken  from  50  m  depth  in  the  Red  Sea  on  13  Sep  2000.  These  data  were  used  to  explore,  for  this 
one  sample  only,  whether  or  not  the  strains  we  were  using  to  assay  phage  titers  throughout  our  studies 
(white  bars)  were  yielding  results  that  might  be  deemed  “typical"  with  respect  to  the  rest  of  the  host  cells 
in  our  collection  (dark  bars).  The  results  show  that  none  of  the  other  host  strains  yield  significantly  higher 
titers  than  those  used  in  this  study.  Data  shown  are  average  and  standard  deviation  of  triplicate  plaque 
assays. 
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The  marine  cyanobacteria  Prochlorococcus  and  Synechococcus  are  the  numerically 
dominant  primary  producers  in  the  oceans,  are  globally  distributed  and  are  known  to  be 
infected  by  three  morphologies  of  viruses  (phage).  The  diversity  of  one  of  these  families  of 
phage,  the  myophage  (phage  of  the  Myoviridae  morphology),  has  been  assessed  in  both 
natural  phage  communities  and  cultured  Synechococcus  myophage  isolates,  using  the  g20 
gene,  a  homolog  to  the  gene  encoding  the  T4  portal  protein.  These  studies  showed  that 
environmental  g20  sequences  formed  9  distinct  phylogenetic  clusters,  only  3  of  which  are 
represented  in  cultured  myophage  isolates.  The  recently  assembled  MIT  phage  collection 
includes  a  diverse  selection  of  myophage,  isolated  using  a  broad  range  of  cyanobacterial 
host  strains  (5  Prochlorococcus  and  8  Synechococcus  strains)  from  numerous  sites  in  the 
Atlantic  Ocean  and  the  Gulf  of  Aqaba.  We  amplified  and  sequenced  g20  fragments  from 
this  collection  to  see  whether  phage  isolated  on  Prochlorococcus  hosts  differed  from  these 
published  g20  sequences.  We  found  that  the  g20  sequences  from  our  collection  clustered 
with  the  same  three  g20  clades  represented  by  other  myophage  isolates.  We  suggest  the  six 
environmental  g20  clades  with  no  cultured  representatives  are  likely  to  be  from  myophages 
that  infect  as  yet  uncultured  hosts.  Finally,  the  lack  of  obvious  relationships  between  g20 
diversity  and  a  variety  of  factors  associated  with  the  phage  isolates  suggests  the  need  for 
new  taxonomic  markers  that  allow  the  tracking  of  ecologically  discrete  groups  of  phage, 
such  as  phage  that  infect  particular  hosts. 

The  discovery  that  virus-like  particles  occur  at  high  abundances  (to  108  ml'1)  in  the 
oceans  (Bergh,  1989;  Bratbak  et  al.,  1990;  Proctor  and  Fuhrman,  1990),  has  prompted  efforts  to 
elucidate  the  roles  of  viruses  in  these  systems.  To  this  end,  there  has  been  extensive  work  on  the 
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phage-host  system  of  the  marine  cyanobacteria  Prochlorococcus  and  Synechococcus ,  which  are 
globally  important  primary  producers  in  the  oceans  (Waterbury  et  al.,  1986;  Partensky  et  al., 

1999).  Their  phages  are  abundant  (Waterbury  and  Valois,  1993;  Suttle  and  Chan,  1994;  Suttle, 
2000;  Lu  et  al.,  2001;  Frederickson  et  al.,  2003;  Marston  and  Sallee,  2003;  Sullivan  et  al.,  2003), 
are  small  but  significant  contributors  to  host  mortality  (Waterbury  and  Valois,  1993;  Suttle  and 
Chan,  1994;  Suttle,  2000),  and  are  thought  to  play  a  role  in  maintaining  the  extensive 
microdiversity  of  the  marine  cyanobacteria  (Waterbury  and  Valois,  1993;  Suttle  and  Chan,  1994; 
Marston  and  Sallee,  2003;  Sullivan  et  al.,  2003). 

Studying  the  diversity  of  phages  has  proven  difficult  because  no  universal  gene, 
analogous  to  the  16S  rRNA  gene  that  is  used  as  the  taxonomic  marker  for  all  microbes,  exists 
throughout  all  phage  families  (Paul  et  al.,  2002).  Thus,  family-specific  genes  have  recently  been 
proposed  for  use  as  taxonomic  tools  (Rohwer  and  Edwards,  2002).  For  the  Myoviridae,  which 
are  highly  represented  among  Synechococcus  cyanophage  isolates  (Suttle  and  Chan,  1993; 
Waterbury  and  Valois,  1993;  Wilson  et  al.,  1993;  Sullivan  et  al.,  2003),  a  fragment  of  the  gene 
homologous  to  the  coliphage  T4  portal  protein  gene,  g20,  has  been  used  for  this  purpose  (Fuller  et 
al.,  1998;  Zhong  et  al.,  2002).  The  g20  homologue  was  initially  chosen  as  a  diversity  indicator 
because  hybridization  studies  of  Synechococcus  cyanomyophage  DNA  suggested  this  region  was 
conserved  across  8  phage  isolates  (Fuller  et  al.,  1998).  Subsequent  study  showed  that  the  g20 
homologue  is  part  of  a  structural  cassette  (gl8,  gl9,  g20,  g21,  g22,  g23)  which  is  highly 
conserved  among  Myoviridae  from  hosts  as  divergent  as  proteobacteria  and  cyanobacteria 
(Hambly  et  al.,  2001).  The  evolution  of  g20  is  so  constrained  because  its  protein  product  (gp20) 
initiates  capsid  assembly  in  T4,  a  process  involving  geometric  precision  (Coombs  and  Eiserling, 
1977;  Hsiao  and  Black,  1978;  van  Driel  and  Couture,  1978)  through  the  formation  of  a  proximal 
vertex  (van  Driel  and  Couture,  1978)  used  for  DNA  packaging  (Hsiao  and  Black,  1978)  and 
binding  the  capsid  to  the  tail  junction  (Coombs  and  Eiserling,  1977). 

The  high  degree  of  conservation  in  the  g20  gene  has  allowed  the  design  of  PCR  primers 
to  amplify  g20  fragments  from  a  range  of  myophage,  facilitating  diversity  studies  of  as-yet 
uncultured  naturally  occurring  myophage  (Zhong  et  al.,  2002).  Myophage  g20  diversity  has  been 
measured  in  a  variety  of  aquatic  environments,  and  offers  a  first  glimpse  at  the  diversity  of  these 
phage.  Studies  using  non-degenerate  PCR  primers  and  evaluation  of  the  amplicons  using 
denaturing  gradient  gel  electrophoresis  (DGGE)  banding  patterns  and  terminal-restriction 
fragment  length  polymorphism  (T-RFLP)  have  revealed  variability  in  g20  diversity  across  space 
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and  time.  Along  a  transect  from  the  Falkland  Islands  to  the  United  Kingdom.  2-12  g20  DGGE 
bands  were  observed  in  one  liter  water  samples  (Wilson  et  al.,  1999,  2000).  A  similar  range  in 
diversity  was  observed  throughout  a  depth  profile  in  waters  off  British  Columbia  (Frederickson  et 
al.,  2003)  and  a  seasonal  cycle  in  a  freshwater  lake  in  France  (Dorigo  et  al.,  2004)  and  in  the 
estuarine  waters  of  the  Chesapeake  Bay  (Wang  and  Chen,  2004).  These  studies  concluded  that 
g20  diversity  was  as  great  within  a  sample  as  between  oceans  (Wilson  et  al.,  1999),  that  phage 
g20  diversity  increased  as  Synechococcus  abundance  increased  (Wilson  et  al.,  1999,  2000; 
Frederickson  et  al.,  2003;  Wang  and  Chen,  2004)  and  that  some  g20  types  were  ubiquitous 
(Wilson  et  al.,  1999,  2000;  Frederickson  et  al.,  2003;  Dorigo  et  al.,  2004). 

Cloning  and  sequencing  of  g20  PCR  amplicons  has  allowed  phylogenetic  analyses  of 
myophage  diversity  along  a  coastal-to-open  Atlantic  Ocean  transect  (Zhong  et  al.,  2002)  and  over 
a  3-year  period  in  coastal  waters  off  Rhode  Island  (Marston  and  Sallee,  2003).  Again,  there  was 
high  variability  in  g20  diversity  among  sites,  with  between  thirteen  and  twenty-nine  different  g20 
sequences  obtained  at  different  sites  along  the  transect  (Zhong  et  al.,  2002).  While  these  authors 
conclude  that  there  was  some  correlation  between  ocean  habitat  and  g20  phylogenty  (e.g., 
phylogenetic  cluster  II  represents  “oceanic”  g20  sequences),  further  sampling  suggested  this  was 
not  the  case,  as  seven  g20  sequences  from  coastal  Synechococcus  myophage  isolated  from  Rhode 
Island  waters  clustered  with  the  putative  “oceanic”  sequences  (Marston  and  Sallee,  2003).  Clone 
libraries  from  the  deep  chlorophyll  maximum  had  a  higher  diversity  of  g20  sequences  than  those 
from  surface  water  samples  in  both  the  Gulf  Stream  and  the  Sargasso  Sea  (Zhong  et  al.,  2002). 
Finally,  g20  sequences  observed  in  these  field  studies  are  not  all  represented  in  cultured  isolates. 
The  sequencing  of  207  clones  amplified  from  samples  along  the  Atlantic  Ocean  transect  revealed 
1 14  unique  g20  sequences  (Zhong  et  al.,  2002),  which  grouped  into  9  phylogenetic  clusters  -  only 
3  of  which  had  cultured  respresentatives  (Zhong  et  al.,  2002).  Subsequent  culturing  of  phage 
isolated  from  Rhode  Island  coastal  waters  using  Synechococcus  hosts  led  to  isolates  whose  g20 
sequences  also  grouped  with  the  3  previous  isolate-containing  clusters  (Marston  and  Sallee, 

2003).  Thus,  6  of  the  9  environmental  myophage  g20  clusters  lack  cultured  isolates. 

The  MIT  phage  collection  (Sullivan  et  al.,  2003)  consists  of  phage  isolated  from  seawater 
samples  collected  from  various  depths  in  the  euphotic  zone  throughout  the  Atlantic  Ocean  and  the 
Gulf  of  Aqaba.  The  phage  were  isolated  using  strains  of  Prochlorococcus  and  Synechococcus 
that  represented  the  known  genetic  diversity  of  these  host  cells  at  the  time  this  study  began 
(Rocap  et  al.,  2002)  but  see  ref.  (Fuller  et  al.,  2003).  Because  the  cultured  myophage  used  in 


36 


previous  g20  diversity  studies  were  isolated  using  only  Synechococcus  strains,  we  wondered 
whether  analysis  of  our  collection  would  expand  the  database  of  g20  sequences  for  phage 
isolates,  and  possibly  help  explain  the  broad  diversity  observed  in  the  field.  However,  as  we 
report  below,  we  found  that  g20  sequences  from  our  isolates  cluster  only  with  the  existing  g20 
sequences  for  cultured  phage,  and  do  not  help  identify  the  sequences  from  field  samples  for 
which  there  are  no  cultured  counterparts. 

MATERIALS  AND  METHODS 

Myophage  isolates.  45  cyanobacterial  myophage  were  isolated  (Table  1)  as  described  previously 
(Waterbury  and  Valois,  1993;  Wilson  et  al.,  1993;  Marston  and  Sallee,  2003;  Sullivan  et  al., 

2003).  S-PM2  and  S-WHM1  were  provided  by  W.  Wilson  and  all  S-RIM  phages  were  provided 
by  M.  Marston.  The  specificity  of  cyanomyophage  g20  primers  was  tested  using  five  marine 
Pseudoalteromonas  spp.  bacteriophage  (HER320,  HER321,  HER322,  HER327,  HER328; 
(Wichels  et  al.,  1998)  that  were  purchased  from  the  Felix  d’Herelle  Reference  Center  for 
Bacterial  Viruses  (contact  H.  Ackermann)  as  well  as  7  heterotrophic  bacteriophage  (EH6-<t>l,  IH6- 
<>7,  IH 1 1  -<t>2,  IH1 1-<(>5,  CB8-<()2,  CB8-<}>6,  CB-<|>8;  (Zhong  et  al.,  2002)  kindly  provided  by  F. 

Chen. 

Testing  of  published  primer  sets:  The  published  (Wilson  et  al.,  1999;  Zhong  et  al.,  2002)  g20 
primer  sets  (CPS4/CPS5  used  in  DGGE  studies,  and  CPS1/CPS8  used  in  sequencing  studies) 
were  designed  without  sequence  information  from  Prochlorococcus  myophage.  We  found  that 
CPS4/CPS5  only  amplified  g20  sequences  from  80%  of  these  new  Prochlorococcus  and 
Synechococcus  myophage  in  our  collection,  while  CPS  1/CPS8  only  amplified  g20  sequences 
from  44%  of  these  isolates  (Table  1).  While  the  CPS4GC/CPS5  primer  set  amplifies  g20  from 
most  of  our  myophage  isolates,  the  PCR  product  is  too  small  (-165  bp)  for  subsequent 
phylogenetic  analyses.  In  contrast,  the  CPS1/CPS8  primer  set  amplifies  a  larger  PCR  product, 
but  from  fewer  isolates  (25  of  45;  Table  1). 

To  obtain  g20  PCR  amplicons  from  myophage  that  would  not  amplify  using  published 
primers,  we  added  degeneracies  to  both  CPS1  and  CPS8,  and  shifted  the  CPS8  primer  based  upon 
genomic  sequence  data  obtained  for  two  of  the  Prochlorococcus  myophage  isolates  (P-SSM2,  P- 
SSM4;  http://www.igi.gov/JGI  microbial/html/index.html).  CPS1.1  5’- 
GT AGW ATWTTYT A Y ATTGAY GTWGG-3’  and  CPS8.1  5’- 
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ARTAYTTDCCDAYRWAWGGWTC-3’.  This  redesigned  primer  set  (CPS1.1/CPS8.1) 
produced  the  expected  size  of  PCR  amplicons  (-594  bp)  from  all  45  cyanomyophage  isolates 
(Table  1)  and  when  sequenced  these  amplicons  proved  to  be  from  g20  homologues  (Figure  1). 
Despite  their  degeneracy,  the  redesigned  CPS1.1/CPS8.1  primer  set  was  able  to  specifically 
amplify  g20  sequences  from  all  of  our  cyanobacterial  myophage  isolates  as  shown  in  specificity 
testing  (Table  1)  of  the  primers  against  7  heterotrophic  marine  bacteriophage  from  Maryland, 
USA  waters  (Zhong  et  al.,  2002),  5  heterotrophic  marine  bacteriophages  from  the  North  Sea 
(Wichels  et  al.,  1998),  as  well  as  16  Podoviridae  and  2  Siphoviridae  that  infect  Synechococcus 
and  Prochlorococcus  (Sullivan  et  al.,  2003) . 

PCR  amplification  and  sequencing.  Previous  g20  PCR  primer  sets  (non-degenerate 
CPS4GC/CPS5  (Wilson  et  al.,  1999)  and  degenerate  CPS1/CPS8  (Fuller  et  al..  1998;  Zhong  et 
al.,  2002)  were  designed  to  amplify  ~200bp  and  -592  bp  fragments,  respectively,  of  the  T4  g20 
homologue  in  myophage. 

PCR  reactions  for  CPS4GC/CPS5  and  CPS1/CPS8  were  conducted  as  described 
previously  (Wilson  et  al.,  1999;  Zhong  et  al.,  2002).  Briefly,  2  pi  of  cyanophage  lysate  was 
added  as  DNA  template  to  a  PCR  reaction  mixture  (total  volume  50  pi)  containing  the  following: 
20  pmol  each  of  a  forward  and  reverse  primer,  lx  PCR  buffer  (50mM  Tris-HCl,  100  mM  NaCl, 
1.5  mM  MgCl2),  250  pM  of  each  dNTP.  and  0.75  U  of  Expand  High  Fidelity  DNA  polymerase 
(Roche,  Indianapolis,  IN).  PCR  amplification  was  carried  out  with  a  PTC-100  DNA  Engine 
Thermocycler  (MJ  Research,  San  Francisco,  CA).  Optimized  thermal  cycling  conditions  varied 
slightly  from  those  reported  as  follows:  CPS4GC/CPS5  required  an  initial  denaturation  step  of 
94°C  for  3  minutes,  followed  by  35  cycles  of  denaturation  at  94°C  for  1  minute,  annealing  at 
50°C  for  1  min.  ramping  at  0.3°C/s,  and  elongation  at  73°C  for  1  minute  with  a  final  elongation 
step  at  73°C  for  4  minutes,  whereas  both  primer  sets  CPS1/CPS8  and  CPS1.1/CPS8.1  required  an 
initial  denaturation  step  of  94°C  for  3  minutes,  followed  by  35  cycles  of  denaturation  at  94°C  for 
15s,  annealing  at  35°C  for  1  min,  ramping  at  0.3°C/s,  and  elongation  at  73°C  for  1  minute  with  a 
final  elongation  step  at  73°C  for  4  minutes.  Systematic  PCR  screening  using  various  primer  sets 
was  conducted  using  the  same  PCR  reaction  conditions  and  amplification  protocol,  but  replacing 
the  High  Fidelity  DNA  polymerase  with  the  less  expensive  Taq  DNA  polymerase  (Invitrogen, 
Carlsbad.  CA)  and  only  using  20  pi  reactions  since  replicate  (range  3-8)  PCR  reactions  were 
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pooled  before  sequencing  to  decrease  PCR  bias  (Polz  and  Cavanaugh,  1998).  In  all  cases,  a  5-10 
pi  aliquot  of  PCR  product  was  analyzed  in  a  1.5%  TAE  gel  stained  with  EtBr.  The  gel  image  was 
captured  and  analyzed  with  an  Eagle  Eye  II  gel  documentation  system  (Stratagene,  La  Jolla,  CA). 
For  purification  and  sequencing,  replicate  PCR  reactions  were  combined,  run  out  on  a  1.5%  TAE 
gel  and  purified  using  the  QLAGEN  QIAquick  gel  extraction  kit  (Qiagen,  Valencia,  CA).  The 
purified  PCR  products  were  sequenced  directly  on  both  strands  using  the  degenerate  PCR  primers 
used  to  obtain  the  product  (CPS1,  CPS8,  CPS  1.1,  CPS8.1)  with  best  results  at  primer 
concentrations  -10-fold  those  suggested  by  the  sequencing  facility  (40  pmol  per  reaction).  To 
have  greater  confidence  in  negative  PCR  results,  templates  that  did  not  produce  amplified  product 
were  tested  against  optimized  primer  sets  multiple  times  (data  not  shown). 

Where  identical  g20  sequences  were  observed  in  our  study,  we  confirmed  the  match  was 
real  and  not  the  result  of  PCR  contamination  by  re-amplifying  and  sequencing  directly  from  fresh 
phage  isolates  (e.g.,  for  P-SSM4,  P-RSM3,  S-SSM2,  and  “Syn”  phages  Syn2,  Syn9,  SynlO, 
Syn26,  Syn30,  Syn33,  Synl,  Syn  19)  -  many  of  which  were  obtained  from  stocks  kept  at  a 
separate  institution. 

Phylogenetic  analysis.  Paired  sequence  data  were  aligned  using  ClustalW  (Thompson  et  al., 
1997)  and  corrected  manually  using  the  sequence  chromatograms.  Consensus  sequences  for  each 
cyanophage  isolate  were  then  translated  in-frame  into  amino  acids.  Multiple  sequence  alignments 
of  translated  amino  acid  consensus  sequences  were  done  with  ClustalW  using  the  Gonnet  protein 
weight  matrix,  a  gap  opening  penalty  of  15  and  gap  extension  penalty  of  0.30  (although  changing 
these  penalties  did  not  significantly  alter  the  alignments).  Phylogenetic  reconstruction  was  done 
using  PAUP  4.0  (Swofford,  2002)  for  parsimony  and  distance  trees  and  Tree-Puzzle  5.0  (Schmidt 
et  al.,  2002)  for  maximum  likelihood  trees.  Evolutionary  distances  for  neighbor-joining  trees 
were  calculated  based  on  mean  character  distances,  while  evolutionary  distances  for  maximum 
likelihood  trees  were  calculated  using  the  JTT  model  of  substitution  assuming  a  gamma- 
distributed  model  of  rate  heterogeneities  with  16  gamma-rate  categories  empirically  estimated 
from  the  data.  A  heuristic  search  with  10  random  addition  replicates  using  the  tree-bisection- 
reconnection  branch  swapping  algorithm  was  used  for  parsimony  trees.  Bootstrap  analysis  was 
used  to  estimate  node  reproducibility  and  tree  topology  for  neighbor-joining  (1,000  replicates) 
and  parsimony  (100  replicates)  trees,  while  quartet  puzzling  (10,000  replicates)  indicates  support 
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for  the  maximum  likelihood  tree.  The  g20  sequence  from  coliphage  T4  was  used  as  the  outgroup 
taxon  for  all  analyses. 

Phylogenetic  analyses  of  183  amino  acids  from  viral  g20  sequence  from  79  taxa  yielded 
robust,  similar  trees  using  both  algorithmic  (neighbor-joining)  and  tree-searching  (parsimony  and 
maximum  likelihood)  methods.  The  translated  g20  sequences  contained  phylogenetically 
informative  regions  (e.g.,  for  parsimony  analyses,  41  positions  were  constant,  25  were  parsimony 
uninformative  and  1 17  were  parsimony  informative).  Differences  between  the  parsimony, 
distance  and  maximum  likelihood  trees  were  limited  to  the  branching  order  of  the  terminal  nodes 
in  a  given  cluster.  To  evaluate  whether  g20  sequence  diversity  correlated  to  a  suite  of  phage 
isolation  parameters,  we  empirically  defined  a  “well  supported  node”  as  one  where  the  average 
support  across  all  three  phylogenetic  methods  was  80%  or  greater. 

Nucleotide  sequence  accession  numbers.  The  nucleotide  sequences  determined  in  this  study 
were  submitted  to  GenBank  and  assigned  accession  numbers  AYXXXXXX  to  AYXXXXXX. 

RESULTS  AND  DISCUSSION 

Using  the  g20  sequences  obtained  from  our  myophage  collection  and  those  in  the 
database,  we  asked  whether  our  myophage  contained  g20  sequences  that  were  novel  or  that  were 
clustered  with  environmental  clades  lacking  cultured  representatives.  All  9  previously  defined 
phylogenetic  clusters  (Zhong  et  ah,  2002)  were  reproduced  in  each  of  our  phylogenetic  trees  (Fig. 
1).  The  g20  sequences  of  all  44  sequenced  phage  isolates  (one  isolate,  S-RIM9,  was  screened  but 
not  sequenced)  did  not  group  with  environmental  clusters  that  lacked  cultured  representatives. 

The  identical  g20  sequences  from  phages  P-SSM9,  P-SSM1 1  and  P-SSM12  along  with  that  from 
S-RIM6  form  a  fourth  monophyletic  cluster  within  the  clusters  containing  cultured 
representatives  (I,  II,  III).  Finally,  while  phylogenetic  analyses  grouped  the  g20  sequences  from 
S-BnMl  and  P-ShMl  with  those  from  other  cultured  myophage,  low  bootstrap  support  made  their 
placement  within  a  particular  cluster  ambiguous. 

These  analyses  suggest  that  while  g20  sequences  from  our  collection  are  often  novel,  they 
are  found  only  in  those  g20  clusters  containing  cultured  representatives.  In  other  words,  these 
new  sequences  do  not  help  “identify”  the  environmental  sequences  in  clusters  that  lack  cultured 
representatives.  We  suggest  that  the  6  environmental  g20  sequence  groups  lacking  cultured 
representatives  may  be  from  two  possible  sources:  phages  that  infect  as  yet  uncultured 
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cyanobacteria,  or  phages  that  infect  other  surface-dwelling  non-cyanobacterial  hosts.  Although 
the  g20  primers  used  in  the  studies  that  yielded  these  sequences  have  been  tested  for  non-specific 
amplification  against  a  range  of  cultivated  g20-containing  bacteriophage,  it  is  likely  that  in  the 
high  viral  diversity  of  the  oceans  (Breitbart  et  al.,  2002)  the  primers  are  not  truly  specific  for 
cyanophage.  If  these  g20  sequences  are  from  bacteriophages,  their  relative  abundance  in  the 
environmental  g20  tree  -  6  of  the  9  clusters  contain  most  of  the  environmental  g20  sequences 
(Zhong  et  al.,  2002)  -  qualitatively  suggests  that  the  hosts  of  the  phages  containing  these  g20 
sequences  may  be  common  surface  water  microbes.  Given  this  criterion,  candidate  hosts  include 
Pelagibacter  (formerly  SARI  1),  Roseobacter  (formerly  SAR83),  SAR86,  and  SARI  16 
(Giovannoni  and  Rappe,  2000). 

As  previously  observed  (Zhong  et  al.,  2002;  Marston  and  Sallee,  2003),  the  nucleotide 
and  amino  acid  sequence  divergence  of  the  g20  regions  amplified  from  our  myophage  isolates 
suggests  the  g20  portal  protein  gene  is  highly  conserved.  This  region  of  the  g20  homologue  from 
coliphage  T4  was  49.6-54.5%  identical  at  the  nucleotide  and  39.8-45.3%  identical  at  the  amino 
acid  level  to  the  g20  sequences  from  our  myophage  isolates.  Among  the  cyanomyophage  g20 
sequences,  there  was  less  than  50%  divergence  of  both  nucleotide  and  amino  acid  sequences,  with 
ranges  of  pairwise  identities  from  59.8-100%  nucleotide  identity  and  59.6-100%  amino  acid 
identity.  Thirteen  groups  of  g20  sequences  from  myophage  isolates  and  environmental  sequences 
contained  identical  amino  acid  sequences  (numbered  1-13  in  Fig.  1).  Such  high  conservation  of 
g20  sequences  has  been  noted  previously  (Zhong  et  al.,  2002;  Marston  and  Sallee,  2003)  and  is 
not  limited  to  the  isolates  of  the  MIT  phage  collection.  Such  striking  g20  sequence  conservation 
from  cyanobacterial  phages  isolated  from  variable  depths  and/or  geographical  regions  suggests 
these  phages,  due  to  the  global  distribution  of  their  hosts,  might  be  exchanging  g20  genes 
throughout  a  global  pool  of  phage  genetic  information  (Hendrix  et  al.,  1999). 

With  such  a  rich  database  of  g20  sequence  information,  we  wondered  whether  the  g20 
sequence  clusters  could  be  used  to  identify  the  host  genus  or  host  strain  used  to  isolate  a  given 
phage  isolate  even  though  it  is  known  that  many  of  these  phages  can  infect  across  a  broad  range 
of  hosts  (Sullivan  et  al.,  2003).  While  none  of  the  three  culture-containing  clusters  (I,  II,  III)  were 
comprised  solely  of  g20  sequences  from  either  Prochlorococcus  or  Synechococcus  phages,  all  10 
g20  sequence  clusters  with  identical  amino  acid  sequences  from  cultures  and  3  of  the  6  well- 
supported  sequence  clusters  (as  defined  in  methods)  were  represented  either  by  Prochlorococcus 
or  Synechococcus  phage,  but  not  both.  These  latter  observations  suggest  a  non-random 
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distribution  of  g20  sequences  within  the  clusters  of  the  tree.  However,  it  is  important  to  bear  in 
mind  that  the  designation  of  Prochlorococcus  or  Synechococcus  phage  is  artificially  designated  as 
the  result  of  the  genus  of  the  cultured  host  originally  used  to  isolate  the  phage.  In  fact,  with  one 
exception  (cluster  #10,  Fig.  1),  all  phages  represented  in  these  clusters  with  identical  g20  amino 
acid  sequences,  can  infect  both  Prochlorococcus  and  Synechococcus  (Sullivan  et  al.,  2003). 

An  examination  of  the  distribution  of  the  original  host  strains  used  to  isolate  the  phages 
showed  that  6  of  these  10  g20  sequence  clusters  with  identical  amino  acid  sequences  (clusters  #3, 
4,  6,  7,  10,  11;  Fig.  1)  and  none  of  the  6  well-supported  clusters  contained  phages  isolated  using 
the  same  original  host  strains,  while  4  of  the  identical  g20  amino  acid  sequence  clusters  (clusters 
#5,  8,  9,  12;  Fig.  1)  and  all  6  of  the  well-supported  clusters  did  not.  Further,  there  were  some 
hosts  (e.g.,  NATL2A,  WH  7803)  used  to  isolate  many  phages  that  contained  g20  sequences  that 
sometimes  clustered  together  and  sometimes  were  scattered  throughout  the  tree  (Fig.  1).  Where 
complete  host  range  information  was  available  (i.e.,  tested  against  1 1  Prochlorococcus  and  10 
Synechococcus  strains),  there  also  was  no  obvious  relationship  between  g20  phylogeny  and  host 
range  (Fig.  1).  Of  9  clusters  with  identical  g20  amino  acid  sequences  from  cultured  phages  where 
host  range  data  were  available,  6  had  different  host  ranges  (clusters  #  3,  5,  6,  7,  9,  12;  Fig.  1) 
while  3  had  the  same  host  ranges  (clusters  #4,  8,  10;  Fig.  1). 

Taken  together,  these  data  present  a  complicated  scenario  where  sometimes  g20 
clustering  appears  non-random,  but  oftentimes  lacks  obvious  correlations  that  might  indicate 
which  host  strain  or  genus  was  used  to  isolate  a  phage.  Further  work  to  more  rigorously  address 
whether  the  clustering  of  these  g20  sequences  from  similar  phages  (e.g.,  at  the  generic  host  level 
or  at  the  single  strain  host  level)  is  different  from  a  random  distribution  of  these  g20  sequences 
should  be  done,  perhaps  as  follows.  There  are  39  phage  g20  sequences  distributed  within  6  well- 
supported  (as  defined  in  methods)  clusters  and  10  identical  g20  amino  acid  clusters  within  this 
phylogeny.  Using  statistical  randomization  procedures  (e.g.,  Monte  Carlo  simulations),  one  could 
evaluate  whether  the  distribution  of  g20  sequences  observed  in  our  tree  could  occur  randomly. 

The  lack  of  obvious  relationships  between  g20  sequence  clustering  and  the  identification 
of  original  phage  host  or  host  range  requires  explanation.  The  host  ranges  of  these  myophages 
vary  greatly,  ranging  from  infecting  a  single  host  strain  to  infecting  other  ecotypes  and  even  other 
genera  (Sullivan  et  al.,  2003).  Therefore  the  “true”  or  optimal  host,  if  such  a  construct  exists  for 
broad  host  range  phages,  might  be  any  one  of  the  host  strains  that  it  cross-infects  (or  an 
uncultured  strain  yet  to  be  isolated).  This  muddies  the  waters  considerably  in  trying  to  link  the 
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evolutionary  history  (using  g20  as  a  proxy)  of  myophages  to  that  of  their  host  of  isolation  or  even 
to  their  apparent  range  of  hosts,  given  the  severe  limitations  in  our  understanding  of  the  in  situ 
reality  for  these  phages  in  the  environment.  Further,  the  disconnect  between  g20  sequence 
clustering  and  host  relationships  is  likely  due  to  the  function  of  g20.  In  coliphage  T4,  the  g20 
gene  encodes  a  portal  protein  (Marusich  and  Mesyanzhinov,  1989)  involved  in  functions  quite 
removed  from  the  direct  interaction  between  phage  and  host.  Thus,  there  is  little  reason  to  expect 
that  the  phylogenetic  affiliation  of  g20  sequences  would  be  related  to  host  range.  (Recall  that  in 
phage,  where  extensive  horizontal  gene  transfer  is  known  to  occur  (Hendrix  et  al.,  1999;  Hendrix, 
2003),  one  gene  might  meaningfully  correlate  to  a  given  property  while  another  gene  in  the 
genome  might  represent  a  different  mode  of  selection).  Host  range  and  the  proteins  mediating  it 
are  dynamic  due  to  the  ongoing  phage-host  ‘arms  race’,  while  g20  remains  highly  conserved, 
through  selective  pressures  that  are  unrelated  to  host  identity.  A  better  candidate  gene  whose 
phylogeny  might  map  onto  host  range  could  be  the  distal  tail  fiber  gene,  which  is  known  to  be  the 
direct  determinants  of  host  range  in  T-even  coliphages  (Henning  and  Hashemolhosseini,  1994). 

The  exploration  of  g20  diversity  in  this  cyanophage  collection  has  expanded  the  sequence 
database  to  include  phage  isolated  using  Prochlorococcus  strains.  Analyses  presented  here,  (1) 
confirm  that  phage  culture  collections  represent  only  a  fraction  of  the  g20  sequence  diversity 
observed  in  the  field,  and  (2)  suggest  that  g20  sequence  clusters  do  not  identify  the  original  host 
of  isolation  or  host  range  of  phage  isolates.  We  are  only  beginning  to  explore  ecological 
questions  in  phage  biology  and  have  sampled  less  than  0.0002%  of  the  global  phage  metagenome 
(Rohwer,  2003).  Current  investigations  of  phage-host  interactions  are  dramatically  hampered  by 
the  lack  of  diversity  markers  for  quantitatively  and  specifically  tracking  phages  that  infect 
particular  hosts.  New  efforts  to  identify  appropriate  marker  genes  for  tracking  such  diversity  are 
critical  to  systematically  understand  phage-host  dynamics  in  the  oceans. 
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Table  1:  Detailed  description  of  phage  isolates  used  in  optimization  of  g20  PCR  primers  and 
results  of  specificity  testing  of  three  PCR  primer  sets  against  various  phage  isolates. _ 

Phages  Hosts  Site  of  Isolation  Depth  Date  isolated  Family*  CPS  CPS  CPS  Ref 

(m)  4GC/5b  1/8“  1.1/8.1" 


Prochlorococcus  cyanophage 


P-SSP1 

MIT  9215 

BATS  /  31°48’N,  64°16’W 

100 

6  June  2000 

P 

- 

- 

- 

P-RSP1 

MIT  9215 

Red  Sea  /  29°28’N,  34°53’E 

0 

15  July  2000 

P 

- 

- 

_ 

P-RSP2 

MIT  9302 

Red  Sea  /  29°28’N,  34°53’E 

0 

15  July  2000 

P 

- 

- 

- 

P-SSP2 

MIT  9312 

BATS  /  31°48’N,  64°16’W 

120 

29  Sep  1999 

P 

- 

- 

- 

P-SSP3 

MIT  9312 

BATS  /  31°48’N,  64°16’W 

100 

29  Sep  1999 

P 

- 

- 

_ 

P-SSP4 

MIT  9312 

BATS  /  31°48’N,  64°16’W 

70 

26  Sep  1999 

P 

- 

- 

- 

P-SSP5 

MIT  9515 

BATS  /  31°48’N,  64°16’W 

120 

29  Sep  1999 

P 

- 

- 

- 

P-SSP6 

MIT  9515 

BATS  /  31°48’N,  64°16’W 

100 

26  Sep  1999 

P 

- 

- 

- 

P-SSP7 

MED4 

BATS  /  31°48’N,  64°16’W 

100 

26  Sep  1999 

P 

- 

- 

- 

P-GSP1 

MED4 

Gulf  Stream  /  3802PN,  66°49’W 

40 

6  Oct  1999 

P 

- 

- 

- 

P-SSP8 

NATL2A 

BATS  /  31°48’N,  64°16’W 

100 

26  Sep  1999 

P 

- 

- 

- 

P-RSP3 

NATUA 

Red  Sea  /  29°28’N,  34°55’E 

50 

13  Sep  2000 

P 

- 

- 

_ 

P-SP1 

SSI  20 

Slope /37°40’N,  73°30’W 

83 

17  Sep  2001 

P 

- 

- 

- 

P-SSM8 

MIT  9211 

W  Sargasso  Sea  /  34°24’N,  72°03’W 

30 

22  Sept  2001 

M 

+ 

+ 

+ 

P-SSM1 

MIT  9303 

BATS  /  31°48’N,  64°16’W 

100 

6  June  2000 

M 

+ 

- 

+ 

P-RSM1 

MIT  9303 

Red  Sea  /  29°28’N,  34°53’E 

0 

15  July  2000 

M 

+ 

- 

+ 

P-RSM4 

MIT  9303 

Red  Sea  /  29°28’N,  34°55’E 

130 

13  Sep  2000 

M 

+ 

+ 

+ 

P-ShMl 

MIT  9313 

Shelf  /  39°60’N,  71°48’W 

40 

16  Sep  2001 

M 

+ 

- 

+ 

P-ShM2 

MIT  9313 

Shelf/ 39°60’N,71°48’W 

0 

16  Sep  2001 

M 

- 

- 

+ 

P-SSM2 

NATL1A 

BATS  /  31°48’N,  64°16’W 

100 

6  June  2000 

M 

+ 

+ 

+ 

P-RSM5 

NATL1A 

Red  Sea  /  29°28’N,  34°55’E 

130 

13  Sep  2000 

M 

+ 

+ 

+ 

P-SSM7 

NATL1A 

BATS  /  31°48’N,  64°16’W 

120 

29  Sep  1999 

M 

- 

- 

+ 

P-SSM3 

NATUA 

BATS  /  31°48’N,  64°16’W 

100 

6  Jun  2000 

M 

- 

- 

+ 

P-SSM4 

NATUA 

BATS  /  31°48’N,  64°16’W 

10 

6  June  2000 

M 

- 

- 

+ 

P-SSM5 

NATUA 

BATS  /  31°48’N,  64°16’W 

15 

26  Sep  1999 

M 

+ 

- 

+ 

P-SSM6 

NATUA 

BATS  /  31°48’N,  64°16’W 

40 

29  Sep  1999 

M 

- 

_ 

+ 

P-RSM2 

NATUA 

Red  Sea  /  29°28’N,  34°55’E 

50 

13  Sep  2000 

M 

+ 

- 

-t 

P-RSM3 

NATUA 

Red  Sea  /  29°28’N,  34°55’E 

50 

13  Sep  2000 

M 

- 

- 

+ 

P-SSM9 

NATUA 

W  Sargasso  Sea  /  34°24’N,  72°03’ W 

0 

22  Sep  2001 

M? 

+ 

- 

+ 

P-SSM10 

NATUA 

W  Sargasso  Sea  /  34°24’N,  72°03’W 

0 

22  Sep  2001 

M? 

+ 

- 

+ 

P-SSM11 

NATUA 

W  Sargasso  Sea  /  34°24’N,  72°03’ W 

0 

22  Sep  2001 

M? 

+ 

- 

+ 

P-SSM12  NATL2A 

Synechococcus  cyanophage 

W  Sargasso  Sea  /  34°24’N,  72°03’W 

95 

22  Sep  2001 

M? 

+ 

+ 

Syn5 

WH  8109 

Sargasso  Sea  /  36°58’N,  73°42’W 

0 

Dec  1990 

P 

- 

- 

- 

Synl2 

WH  8017 

Gulf  Stream  /  34°06’N,  61°0r  W 

0 

July  1990 

P 

- 

- 

- 

S-SM1 

WH  6501 

Slope  /  37°40’N,  73°30’W 

0 

17  Sep  2001 

M 

- 

- 

+ 

S-ShMl 

WH  6501 

Shelf/ 39°60’N,71°48’W 

0 

16  Sep  2001 

M 

+ 

+ 

+ 

S-SSM1 

WH  6501 

W  Sargasso  Sea  /  34°24’N,  72°03’ W 

70 

22  Sep  2001 

M 

+ 

+ 

+ 

Syn  2 

WH  8012 

Sargasso  Sea  /  34°06’N,  61°01’W 

0 

July  1990 

M 

- 

+ 

+ 

Syn  9 

WH  8012 

Woods  Hole  /  41°31’N,  71°40’W 

0 

Oct  1990 

M 

+ 

+ 

+ 

Syn  10 

WH  8017 

Gulf  Stream  /  36°58’N,  73°42’ W 

0 

Dec  1990 

M 

+ 

+ 

+ 

Syn  26 

WH  8017 

NE  Providence  Channel  / 
25°53’N,  77°34’W 

0 

Jan  1992 

M 

+ 

+ 

+ 

S-SM2 

WH  8017 

Slope  /  37°40’N,  73°30’W 

15 

17  Sep  2001 

M 

+ 

- 

+ 

Syn30 

WH  8018 

NE  Providence  Channel  / 
25°53’N,  77°34’W 

0 

Jan  1992 

M 

+ 

- 

+ 

S-SSM3 

WH  8018 

W  Sargasso  Sea  /  34°24’N,  72°03’ W 

0 

22  Sep  2001 

M 

+ 

+ 

+ 

S-SSM4 

WH  8018 

W  Sargasso  Sea  /  34°24’N,  72°03’W 

110 

22  Sep  2001 

M 

+ 

+ 

+ 

S-RIM3 

WH  8018 

Mt.  Hope  Bay,  RI  /  41°39’N,  71°15’W 

0 

Sept.  1999 

M? 

+ 

- 

+ 

Syn  33 

WH  7803 

Gulf  Stream  /  25°5  l’N,  79°26’ W 

0 

Jan  1995 

M 

+ 

+ 

+ 

S-PM2 

WH  7803 

English  Channel  /  50°18’N,  4°12’W 

0 

23  Sep  1992 

M 

+ 

+ 

+ 

S-WHM1 

WH  7803 

Woods  Hole  /  4 1  °3 l’N,  7 1  °40’ W 

0 

11  Aug  1992 

M 

+ 

+ 

+ 

S-RIM9 

WH  7803 

Mt.  Hope  Bay,  RI  /  41°39’N,  71°15’W 

0 

May  2000 

M? 

+ 

- 

+ 

S-RIM17 

WH  7803 

Mt.  Hope  Bay,  RI/41°39’N,  71°15’W 

0 

July  2001 

M? 

+ 

- 

+ 

S-RIM24 

WH  7803 

Mt.  Hope  Bay,  RI  /  41°39’N,  71°15’W 

0 

Dec  2001 

M? 

+ 

- 

+ 

S-RIM30 

WH  7803 

Mt.  Hope  Bay,  RI  /  41°39’N,  71°15’W 

0 

June  2002 

M? 

+ 

- 

+ 
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K>K»  U)K)  U>U>U>U>~  —  —  —  —  D  M  D 


Phages 

Hosts 

Site  of  Isolation 

Depth 

Date  isolated 

Family* 

CPS 

CPS 

CPS 

Ref 

(m) 

4GC/5b 

l/8b 

1.1/8.1 b 

Syn  1 

WH  8101 

Woods  Hole  /  41°31'N.  71°40’W 

0 

Aug  1990 

M 

+ 

- 

+ 

3 

S-ShM2 

WH  8102 

Shelf/ 39°60’N,71°48’W 

0 

16  Sep  2001 

M 

+ 

+ 

+ 

1 

S-SSM2 

WH  8 1 02 

W  Sargasso  Sea  /  34°24’N,  72°03’W 

0 

22  Sep  2001 

M 

+ 

+ 

+ 

1 

S-SSM5 

WH  8 1 02 

W  Sargasso  Sea  /  34°24’N,  72°03’W 

95 

22  Sep  2001 

M 

+ 

+ 

+ 

2 

Syn  19 

WH  8109 

Sargasso  Sea  /  34°06’N.  61°01’W 

0 

July  1990 

M 

- 

- 

+ 

3 

S-SSM6 

WH  8109 

W  Sargasso  Sea  /  34°24’N,  72°03’W 

70 

22  Sep  2001 

M 

+ 

+ 

+ 

2 

S-SSM7 

WH  8109 

W  Sargasso  Sea  /  34°24’N,  72°03’W 

95 

22  Sep  2001 

M 

+ 

+ 

+ 

2 

Other  phages 

IH6-01 

IH6 

Inner  Harbor,  Baltimore,  MD 

0 

17  Nov  2000 

M 

- 

- 

- 

6 

IH6-<J>7 

IH6 

Inner  Harbor,  Baltimore,  MD 

0 

17  Nov  2000 

P 

- 

- 

- 

6 

IH1142 

Alteromonas 

Inner  Harbor,  Baltimore,  MD 

0 

17  Nov  2000 

M 

- 

- 

- 

6 

IH 1 1  -<J>5 

Alteromonas 

Inner  Harbor,  Baltimore,  MD 

0 

17  Nov  2000 

P 

- 

- 

- 

6 

CB8-02 

CB8 

Chesapeake  Bay,  MD 

0 

17  Nov  2000 

M 

- 

- 

- 

6 

CB8-<(>6 

CB8 

Chesapeake  Bay,  MD 

0 

17  Nov  2000 

M 

- 

- 

- 

6 

CB-08 

Vibrio 

alginolyticus 

Chesapeake  Bay,  MD 

0 

17  Nov  2000 

M 

— 

— 

— 

6 

HER320 

H7 

Helgoland,  North  Sea 

0 

1976-1978 

M 

- 

- 

- 

7 

HER321 

H100 

Helgoland,  North  Sea 

0 

1976-1978 

P 

- 

- 

- 

7 

HER322 

H100 

Helgoland,  North  Sea 

0 

1976-1978 

M 

- 

- 

- 

7 

HER327 

11-68 

Helgoland,  North  Sea 

0 

1976-1978 

S 

- 

- 

- 

7 

HER328 

HI  05 

Helgoland,  North  Sea 

0 

1976-1978 

S 

- 

- 

- 

7 

(Table  1  continued) 

a  M,  P  and  S  represent  the  virus  families  Myoviridae,  Podoviridae  and  Siphoviridae ,  respectively. 
?  indicates  the  morphology  of  the  phage  particle  has  not  been  confirmed  with  electron 
microscopy,  but  is  presumably  a  Myoviridae  based  upon  amplification  and  sequencing  of  a  g20 
PCR  product. 

b+,  positive  PCR  amplification;  no  desired  PCR  product. 

r  References  code:  1  =  Sullivan  et  al,  2003;  2  =  This  study;  3  =  Waterbury  &  Valois;  1993;  4  = 
Marston  &  Salee,  2003;  5  =  Wilson  et  al.,  1993;  6  =  Zhong  et  al.,  2002;  7  =  Wichels  et  al.,  1998 
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Figure  1:  Evolutionary  relationships  determined  using  183  amino  acids  of  the  portal  protein  gene 
(g20)  amplified  from  MIT  myophage  isolates  (colored  and  italicized ),  previously  characterized 
myophage  isolates  (colored),  and  environmental  g20  sequences  (Zhong  et  al.,  2002;  Marston  and 
Sallee,  2003).  The  tree  shown  was  inferred  by  neighbor-joining  as  described  in  the  methods. 
Support  values  shown  at  the  nodes  are  neighbor-joining  bootstrap  /  maximum  parsimony 
bootstrap  /  maximum  likelihood  quartet  puzzling  support  (values  less  than  50  are  designated  with 
a  dash).  Well  supported  nodes  (as  defined  in  methods)  are  designated  by  italicized  support 
values.  Clusters  were  assigned  as  designated  by  Zhong  et  al.  (2002);  clusters  I,  II  and  III  contain 
g20  sequences  from  cultured  phage  isolates,  while  clusters  A-F  represent  environmental  g20 
sequences.  Clusters  containing  g20  sequences  that  are  identical  are  numbered  with  alphanumeric 
numbers  (1-13).  For  cultured  phage,  colored  isolate  names  indicate  whether  they  were  originally 
isolated  using  a  Synechococcus  (orange)  or  Prochlorococcus  (green)  host;  black  lettering 
following  phage  isolate  names  indicates  the  original  host  strain  used  for  isolation,  and  colored 
dots  indicates  that  the  phage  cross-infects  at  least  one  strain  from  the  high-light  adapted 
Prochlorococcus  (blue),  the  low-light  adapted  Prochlorococcus  (green)  or  Synechococcus 
(orange),  whereas  colored  dashes  indicates  no  cross-infection  among  those  ecotypes.  Isolates  not 
available  for  host  range  testing  have  no  indication  of  their  host  range. 
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Whole  genome  sequencing  gives  us  a  new  window  for  viewing  phage-host  interactions  and 
their  evolutionary  implications.  Here  we  report  the  presence  of  genes  central  to  oxygenic 
photosynthesis  in  the  genomes  of  three  phage  from  2  families  of  viruses  ( Myoviridae , 
Podoviridae)  that  infect  the  marine  cyanobacterium,  Prochlorococcus.  The  gene  that  encodes 
the  photosystem  II  (PSII)  core  reaction  center  protein  D1  (psbA ),  and  a  gene  (hli)  that 
encodes  one  high  light  inducible  protein  (HLIP)  type  are  present  in  all  3  phage  genomes. 

The  two  myoviruses  contain  additional  hli  gene  types,  and  one  of  them  contains  psbD,  which 
encodes  the  second  PSII  core  reaction  center  protein,  D2,  and  the  other  contains 
photosynthetic  electron  transport  genes  coding  for  plastocyanin  ipetE)  and  ferredoxin 
( petF ).  These  uninterrupted,  full-length  genes  are  conserved  in  their  amino  acid  sequence, 
suggesting  that  they  encode  functional  proteins  that  may  help  maintain  photosynthetic 
activity  during  infection.  Phylogenetic  analyses  show  that  phage  Dl,  D2  and  HLIP  proteins 
cluster  with  those  from  Prochlorococcus,  indicating  that  they  are  of  cyanobacterial  origin. 
Their  distribution  among  several  Prochlorococcus  clades  further  suggests  that  the  genes 
encoding  these  proteins  were  transferred  from  host  to  phage  multiple  times.  Phage  HLIPs 
cluster  with  multicopy  types  found  exclusively  in  Prochlorocococus,  suggesting  that  phage 
may  be  mediating  the  expansion  of  the  hli  gene  family  by  transferring  these  genes  back  to 
their  hosts  after  a  period  of  evolution  in  the  phage.  These  gene  transfers  are  likely  to  play  a 
role  in  the  fitness  landscape  of  host  and  phage  in  the  surface  oceans. 
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The  genomes  of  viruses  (phage)  that  infect  bacteria  contain  a  variety  of  genes  homologous  to 
those  found  in  their  bacterial  hosts  (1-5).  Many  encode  functional  proteins  involved  in  processes 
of  direct  importance  for  the  production  of  phage  progeny.  They  include  genes  involved  in  DNA 
replication,  nucleotide  metabolism  and  RNA  transcription  and  are  found  in  the  genomes  of  both 
lytic  phage  and  prophage  (3,  6).  It  is  likely  that  many  originated  from  their  hosts  (2, 4),  and  that 
some  host  genes  that  occur  in  multiple  copies  have  been  (re)acquired  from  phage  (2,  7)  —  either 
after  a  period  of  evolution  in  the  phage  or  after  acquisition  of  the  gene  from  a  different  host. 

Host  genes  that  are  not  directly  related  to  the  production  of  new  phage,  such  as  genes  involved  in 
phosphate  sensing  and  metabolism  (8,  9)  and  the  scavenging  of  oxygen  radicals  (10)  are  also 
found  in  phage  genomes,  and  may  benefit  phage  by  temporarily  enhancing  host  functionality 
prior  to  lysis.  In  addition,  prophage  can  provide  their  hosts  with  new  functions  by  encoding  genes 
not  otherwise  found  in  the  host’s  genome,  such  as  virulence  factors,  toxin  production  genes,  and 
genes  involved  in  the  immune  response  to  pathogenic  infection  (5,  6,  1 1). 

Genes  involved  in  photosynthesis  have  recently  been  found  in  a  lytic  phage  isolated  on 
Synechococcus  WH7803  (12),  a  member  of  the  marine  cluster  A  unicellular  cyanobacteria  that  is 
widespread  in  the  oceans.  A  member  of  the  Myoviridae  family  of  dsDNA  viruses,  this  phage 
contains  2  photosynthetic  genes  {psbD  and  an  interrupted  psbA  gene)  that  code  for  the  2 
photosystem  II  (PSII)  core  reaction  center  proteins  found  in  all  photosynthetic  organisms.  These 
genes  were  not  found  in  another  phage  —  a  member  of  the  Podoviridae  family  —  isolated  on  the 
same  strain  of  Synechococcus  (13).  This  leads  one  to  wonder  whether  the  presence  of 
photosynthetic  genes  in  phage  is  a  rare  phenomenon  and  to  what  extent  it  is  specific  for  a 
particular  phage  or  host  type.  If  these  genes  are  widespread  and  diverse  in  cyanophage,  what  is 
their  origin?  Were  they  acquired  through  a  single  ancestral  transfer  event? 

The  phage-host  system  for  Prochlorococcus  and  Synechococcus  (14,  15),  which  form  a 
monophyletic  clade  within  the  cyanobacteria  (16-19),  is  well  suited  to  begin  to  answer  these 
questions.  Members  of  each  genus  form  distinct  sub-genera  clusters  within  this  clade,  which  in 
Prochlorococcus  also  correspond  to  their  efficiency  of  light  utilization  (17).  Numerous  phage 
have  been  isolated  using  this  diverse  group,  including  members  of  the  Myoviridae,  Podoviridae 
and  Siphoviridae  families,  and  the  degree  of  cross  infection  —  a  mechanism  for  horizontal  gene 
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transfer  —  among  and  between  strains  has  been  analyzed  ( 14,  15).  The  genomes  of  4  of  the  host 
strains  have  been  published  (20-22),  and  the  genomes  of  three  phage  have  been  sequenced  by  the 
US  Department  of  Energy  Joint  Genome  Institute  (DOE  JGI;  www.jgi.doe.gov),  providing  a 
database  to  begin  an  analysis  of  the  distribution  of  host  genes  among  hosts  and  the  phage  that 
infect  them,  and  exploring  their  phylogenetic  relationships. 

Here  we  report  that  the  genomes  of  3  phage  that  infect  Prochlorococcus  collectively  contain  a 
number  of  host-like  photosynthetic  genes.  We  further  infer  from  bioinformatic  analyses  that  they 
are  likely  to  play  a  functional  role  during  infection,  as  well  as  impact  the  evolutionary  trajectory 
of  both  phage  and  host  in  the  surface  oceans. 

Materials  and  Methods 

Selection  and  preparation  of  cyanophage  for  genome  sequencing 

Two  myoviruses  and  1  podovirus  were  chosen  for  sequencing  based  on  their  host  range  within 
Prochlorococcus,  with  no  prior  knowledge  of  their  gene  content.  The  podovirus  P-SSP7  (43  kb 
genome),  infects  a  single  high-light  adapted  (HL)  Prochlorococcus  strain.  The  myovirus  P-SSM2 
(252  kb)  infects  3  low-light  adapted  (LL)  Prochlorococcus  strains  and  the  P-SSM4  myovirus 
(178  kb)  infects  2  HL  and  2  LL  Prochlorococcus  strains  ((15),  see  Table  I).  None  of  these  phage 
cross-infects  any  of  the  Synechococcus  strains  tested. 

Phage  were  propagated  on  their  respective  Prochlorococcus  hosts  (P-SSP7  on  MED4.  P-SSM2 
on  NATL1A,  P-SSM4  on  NATL2A)  and  were  purified  for  DNA  extraction  and  the  construction 
of  clone  libraries  as  described  previously  (8).  Briefly,  1L  of  lysed  culture  was  treated  with  DNase 
and  RNase  to  degrade  host  nucleic  acids.  Cell  debris  were  removed  and  the  phage  remaining  in 
the  supernatant  were  precipitated  using  PEG  8000.  Concentrated  phage  were  purified  on  a  cesium 
chloride  step  gradient  (steps  were  p=1.30,  1.40,  1.50,  1.65;  the  gradient  was  spun  at  2  hour,  4°C, 
104.000  Xg)  and  dialyzed  against  a  buffer  containing  lOOmM  TrisCl  (pH  7.5),  100  mM  MgS04 
and  30  mM  NaCl.  Purified  phage  were  burst  using  SDS  (0.5%)  and  proteinase  K  (50/xg.ml1), 
DNA  was  extracted  with  phenolxhloroform  and  concentrated  by  ethanol  precipitation.  A  custom 
LASL  clone  library  was  constructed  by  Lucigen  Inc  (Middleton,  WI)  as  described  previously 
(23).  Inserts  were  sequenced  and  genomes  assembled  by  the  DOE  JGI.  All  analyses  were 
conducted  on  the  phage  genomes  as  provided  on  17  Oct  03  (P-SSM2,  P-SSM4)  and  19  Nov  03 
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(P-SSP7).  At  that  time,  these  genomes  were  in  large  high-quality  contigs  compiled  from  26-fold 
(P-SSP7),  30-fold  (P-SSM2)  and  39-fold  (P-SSM4)  coverage,  respectively.  The  P-SSP7  and  P- 
SSM4  genomes  were  not  yet  closed. 

PCR  amplification  of  psbA 

Genomic  DNA  was  isolated  from  Prochlorococcus  cultures  using  the  DNEasy  kit  (Qiagen, 
Valencia,  CA).  Partial  psbA  sequences  were  amplified  using  primers  from  (19)  or  for 
Prochlorococcus  MIT9211  using  the  following  primers  (5’-AACATCATYTCWGGTGCWGT- 
3’)  and  (5’-TCGTGCATTACTTCCAT ACC-3’).  Reactions  (50/xl)  consisted  of  4mM  MgCl2, 
200/rM  each  dNTP,  0.25/tM  each  primer,  2.5  units  Taq  DNA  polymerase  (Invitrogen,  Carslbad, 
CA),  4ng  genomic  DNA.  Amplification  conditions,  run  on  a  RoboCycler  Gradient  96 
thermocycler  (Stratagene,  La  Jolla,  CA),  comprised  steps  at  92 °C  for  4  min,  35  cycles  at  92°C  for 
1  min,  50°C  for  1  min  and  68°C  for  1  min,  followed  by  a  final  extension  step  at  68°C  for  10  min. 
PCR  products  were  gel  purified  and  sequenced  in  both  forward  and  reverse  directions  (Davis 
Sequencing). 

Identification  of  genes  and  transcriptional  regulatory  elements 

Assembled  phage  genomes  were  examined  to  identify  open  reading  frames  (ORFs)  using 
GeneMark  (24).  Gene  identifications  were  based  on  homology  to  known  proteins  using  the  blastp 
program  (ftp://ftp.ncbi.nih.gov/blast)  with  an  E-value  cut-off  of  10'5.  Further  prerequisites  were 
used  for  the  identification  of  genes  encoding  ferredoxin  and  high  light  inducible  proteins  (HLEPs) 
due  to  their  small  size  and  sequence  divergence.  Ferredoxin  encoding  genes  (petF)  were  included 
in  our  analyses  if  they  encoded  the  2Fe-2S  iron-sulfur  cluster  binding  domain  (fer2)  (with  an  E 
value  <1010  as  determined  from  the  NCBI  conserved  domain  database  blast  tool,  rpsblast).  HLIP 
encoding  genes  (hli)  were  identified  as  present  for  this  study  if  they  encoded  at  least  6  out  of  10 
of  the  amino  acids  in  the  motif  AExxNGRxAMIGF  (25).  Bhaya  et  al.  (26)  report  that  many 
Prochlorococcus  hli  genes  code  for  a  conserved  9  amino  acid  C-terminal  sequence,  with  the 
consensus  sequence  TGQIIPGI/FF.  For  this  study,  this  sequence  was  defined  as  present  when  at 
least  6  out  of  9  of  the  conserved  amino  acids  were  found  at  the  C-terminus  of  the  HLDP. 

Rho-independent  transcriptional  terminators  were  identified  using  the  TransTerm  program  (27), 
and  all  had  an  energy  score  of  <-10  and  a  tail  score  of  <-5.  Potential  bacterial  sigma-70  promoters 
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were  identified  in  intergenic  regions  using  the  program  BPROM  (www.softberrv.comV  Promoter 
sequences  reported  here  had  a  linear  discriminant  function  greater  than  2.5.  While  identification 
of  terminators  is  quite  robust,  detection  of  potential  promoters  in  cyanophage  is  more  precarious 
as  the  predictive  ability  of  both  cyanophage  and  cyanobacterial  promoter  elements  is  presently 
low. 

Sequence  manipulation  and  analyses 

Sequences  were  initially  aligned  using  Clustal  X  and  edited  manually  when  necessary.  Amino 
acid  alignments  served  as  the  basis  for  the  manual  alignment  of  nucleotide  sequences.  Regions 
that  could  not  be  confidently  aligned  were  excluded  from  analyses,  as  were  gaps.  The  divergence 
estimator  program,  K-estimator  6.0  (28),  was  used  to  estimate  the  frequency  of  synonymous  and 
non-synonymous  nucleotide  substitutions  and  employs  the  Kimura  2p  correction  method  for 
multiple-hits. 

The  PAUP  V4.0bl0  package  was  used  for  the  construction  of  distance  and  maximum  parsimony 
trees.  Amino  acid  distance  trees  were  inferred  using  minimum  evolution  as  the  objective  function 
and  mean  distances.  Heuristic  searches  were  performed  with  100  random  addition  sequence 
replicates  and  the  tree-bisection  and  reconnection  branch  swapping  algorithm.  Starting  trees  were 
obtained  by  stepwise  addition  of  sequences.  Bootstrap  analyses  of  100  resamplings  were  carried 
out.  Maximum  likelihood  trees  were  constructed  using  TREE-PUZZLE  5.0.  Evolutionary 
distances  were  calculated  using  either  the  JTT  model  of  substitution  (for  Dl,  D2  and  ferredoxin) 
or  the  VT  model  of  substitution  (for  the  highly  divergent  HLIPs)  assuming  a  gamma-distributed 
model  of  rate  heterogeneities  with  16  gamma-rate  categories  empirically  estimated  from  the  data. 
Quartet  puzzling  support  was  estimated  from  10,000  replicates. 

In  cases  where  phylogenetic  analyses  of  small  genes  received  low  bootstrap  support  we  used  the 
GeneRAGE  clustering  tool  (29)  which  clusters  protein  sequences  with  significant  relationships  at 
user  defined  E-value  thresholds.  The  input  to  GeneRage  was  an  all-against-all  table  of  blast 
comparisons  of  amino  acid  sequences  made  using  blastp  from  NCBI.  GeneRAGE  uses  a  Smith- 
Waterman  dynamic  programming  alignment  algorithm  to  correct  for  false  positive  linkages 
whenever  the  pairwise  relationships  are  not  symmetrical.  For  HLIPs,  an  E-value  cutoff  of  10'14 
was  used.  The  clusters  containing  the  phage  HLIPs  were  preserved  down  to  an  E-value  cutoff  of 
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10'17.  For  plastocyanin  and  ferredoxin  respectively.  E-value  cutoffs  of  10'26  and  10' 34  linked  the 
phage  proteins  with  a  large  group  of  proteins,  whereas  at  E-value  cutoffs  of  1028  and  10  ’6  the 
respective  phage  proteins  did  not  cluster  with  other  sequences. 

Results 

A  suite  of  host  photosynthesis  genes  was  found  in  the  three  phage  genomes  (Fig.  1).  The  psbA 
gene,  encoding  the  PSII  core  reaction  center  protein,  Dl,  and  one  hli  gene  type,  encoding  the 
HLEP  cluster  14  type  protein  ( sensu  (26))  were  present  in  all  3  phage.  HLIPs  are  thought  to  be 
involved  in  protecting  the  photosynthetic  apparatus  from  excess  excitation  energy  during  stressful 
conditions  in  cyanobacteria  (30).  Other  photosynthesis  related  genes  were  present  in  the  two 
myovirus  genomes  (Fig.  IB,  C).  Phage  P-SSM4  contains  the  psbD  gene  encoding  the  second  PSII 
core  reaction  center  protein,  D2,  while  phage  P-SSM2  contains  two  photosynthetic  electron 
transport  genes  coding  for  plastocyanin  (petE)  and  ferredoxin  ipetF).  Both  phage  contain 
additional  gene  types  from  the  hli  multigene  family. 

The  deduced  amino  acid  sequences  of  the  phage  photosynthesis  genes  are  highly  conserved  and 
therefore  have  the  potential  to  be  functional  proteins.  The  coding  sequences  of  all  of  these  genes 
are  uninterrupted  and  show  a  high  degree  of  identity  to  their  host  homologs  (up  to  85%  and  95% 
nucleotide  and  amino  acid  identities  respectively;  Suppl.  Table  II,  Suppl.  Figs.  4-8).  In  the  case  of 
D 1  and  D2  from  all  three  phage,  the  greatest  amino  acid  divergence  is  in  the  N-terminal  leader 
sequences  that  do  not  form  part  of  the  functional  protein.  Furthermore,  divergence  analyses  based 
on  estimates  of  the  frequency  of  non-synonymous  (Ka)  and  synonymous  (Ks)  nucleotide 
substitutions  between  phage-  and  host-encoded  genes,  revealed  Ka/Ks  ratios  of  less  than  0.45  for 
all  genes  —  with  values  of  <0.1,  similar  to  that  among  Prochlorococcus  genes,  for  psbA  and 
psbD  (Suppl.  Table  III)  —  indicating  that  the  majority  of  nucleotide  substitutions  did  not  cause  a 
change  in  amino  acid  sequence.  These  findings  suggest  that  the  phage-encoded  genes,  particularly 
psbA  and  psbD,  have  been  subjected  to  strong  selective  pressure  to  conserve  their  amino  acid 
sequences,  which  is  consistent  with  the  hypothesis  that  they  are  functional. 

All  of  the  photosynthesis  genes,  with  the  exception  of  the  plastocyanin  gene,  petE,  are  arranged 
together  in  the  phage  genomes,  suggesting  that  they  may  be  expressed  at  a  similar  stage  of 
infection  (3,  31).  In  addition,  identification  of  potential  promoter  and  terminator  elements 
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suggests  that  distinct  transcriptional  units  are  found  within  these  genomic  regions.  In  the  genome 
of  P-SSP7,  for  example,  psbA  and  the  single  hli  gene  may  be  co-transcribed  with  the  adjacent 
phage  structural  genes  in  a  single  operon.  Most  of  the  genes  in  this  region  have  overlapping  start 
and  stop  codons  and  are  flanked  by  a  putative  sigma-70  transcriptional  promoter  and  rho- 
independent  transcriptional  terminator  (Fig.  1A).  This  arrangement  further  suggests  that  the 
photosynthesis  genes  are  expressed  in  the  latter  portion  of  the  lytic  cycle,  as  is  known  for 
structural  proteins  in  other  T7-like  podoviruses  (31).  In  contrast,  the  presence  of  transcriptional 
terminators  flanking  the  regions  containing  photosynthetic  genes  in  the  myoviruses  suggests  that 
they  may  be  transcribed  as  discrete  transcriptional  units  largely  independent  of  the  surrounding 
phage  genes. 

The  cyanobacterial  origin  of  the  phage  psbA  and  psbD  genes  is  suggested  by  the  presence  of 
certain  features  in  both  phage  and  host  genes.  Phage  psbA  genes  code  for  a  7  amino  acid  indel 
close  to  the  carboxy  terminus  of  the  D1  protein  (Suppl.  Fig.  4)  which  is  almost  exclusively  found 
in  cyanobacterial  D 1  proteins.  Similarly,  the  phage  psbD  gene  codes  for  a  7  amino  acid  indel  in 
the  center  of  the  D2  protein  that  is  also  found  in  Prochlorococcus  MED4  and  SSI 20  (but  not  in 
other  cyanobacteria  or  eukaryotic  D2  proteins)  (Suppl.  Fig.  5).  Interestingly,  the  psbD  gene  from 
neither  Synechococcus  WH8102  nor  the  Synechococcus  phage  S-PM2  codes  for  these  additional 
amino  acids  (Suppl.  Fig.  5).  These  findings  suggest  that  Prochlorococcus  phage  acquired  psbD 
from  Prochlorococcus  and  Synechococcus  phage  acquired  this  gene  from  Synechococcus. 

Phylogenetic  analyses  of  the  PSII  core  reaction  center  proteins  further  supports  the  cyanobacterial 
origin  of  the  phage  genes  and,  along  with  knowledge  of  phage  host  ranges  (15),  suggests  that  they 
were  acquired  multiple  times  from  their  hosts.  The  phage  D1  and  D2  proteins  clustered  with 
marine  cyanobacteria  in  both  distance  (Fig.  2A  &  B)  and  maximum  likelihood  trees.  Proteins 
encoded  by  phage  that  only  infect  Prochlorococcus  clustered  with  Prochlorococcus,  while  those 
from  a  phage  that  infects  only  Synechococcus  (sequences  taken  from  (12))  clustered  with 
Synechococcus,  as  did  an  environmental  sequence  (B AC9D04)  encoding  both  D 1  and  phage 
structural  genes  (32).  Moreover,  D1  from  two  of  the  Prochlorococcus  phage  clustered  within 
Prochlorococcus  clades  that  match  their  host  range  (Fig.  2A).  However,  D1  from  the  third 
Prochlorococcus  phage  did  not  cluster  within  a  specific  Prochlorococcus  clade  suggesting  that  its 
psbA  gene  was  either  acquired  from  an  as  yet  uncultured  Prochlorococcus  type,  or  has  diverged 
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to  an  extent  that  prevents  identification  of  the  common  ancestor.  The  fact  that  the  phage  D1  and 
D2  proteins  are  distributed  in  both  the  Prochlorococcus  and  Synechococcus  clades,  and  are 
largely  consistent  with  their  host  range,  suggests  that  the  genes  were  acquired  in  independent 
transfer  events  from  their  cyanobacterial  hosts  ( sensu  (2,  4)).  This  could  have  occurred  de  novo 
between  distinct  host  and  phage  several  times,  or  these  genes  may  have  been  transferred  from 
host  to  phage  in  a  process  akin  to  gene  conversion  subsequent  to  an  ancestral  transfer  event  (see 
Discussion  section).  If  the  host  genes  in  phage  resulted  from  a  single  ancestral  event,  followed  by 
subsequent  vertical  or  lateral  transfers  from  phage  to  phage,  the  phage-  and  host-encoded  genes 
would  have  formed  monophyletic  clades  distinct  from  each  other. 

Phylogenetic  analyses  of  the  plastocyanin  electron  transport  protein  in  host  and  phage  also 
suggests  that  the  phage  petE  gene  is  of  cyanobacterial  origin  (Suppl.  Fig.  9).  However,  the  data  is 
not  conclusive  as  to  the  origin  of  the  phage  gene  from  within  the  cyanobacteria.  The  phage 
protein  clusters  with  filamentous  cyanobacteria,  but  contains  a  10  amino  acid  indel  found  only  in 
unicellular  cyanobacteria  (Suppl.  Fig  6).  GeneRAGE  clustering  analysis  did  not  resolve  the 
clustering  of  the  phage  plastocyanin  protein.  Both  phylogenetic  and  GeneRAGE  analyses  of  the 
ferredoxin  electron  transport  protein  encoded  by  the  petF  gene  were  inconclusive  as  to  the  origin 
of  the  phage  gene.  These  results,  together  with  the  greater  divergence  estimates  (Ka/Ks)  for  the 
phage-  and  Prochlorococcus -encoded  petE  and  petF  gene  pairs  (0.19-0.43)  than  among 
Prochlorococcus  gene  pairs  (0.03-0.07)  (Suppl.  Table  III),  suggest  that  the  photosynthetic 
electron  transport  genes  encoded  by  P-SSM2  either  originated  from  an  organism  for  which  a 
close  relative  does  not  currently  exist  in  the  database,  or  have  diverged  to  an  extent  that  prevents 
inference  as  to  their  origin.  These  may  be  new  genes  in  the  making. 

Previous  analyses  of  HLIPs  in  cyanobacterial  genomes  revealed  the  presence  of  genetically 
diverse  types,  with  distinctly  different  clusters  formed  for  single  and  multiple  copy  HLIPs  (26). 
Genes  found  in  a  single  copy  in  each  of  the  4  sequenced  marine  cyanobacterial  genomes  form  4 
distinct  clusters  (GR  C5,  C6,  Cl  and  C8  in  Fig.  3)  that  are  interspersed  with  HLEPs  from 
freshwater  cyanobacteria  in  a  large  cluster  (Fig.  3),  whereas  multi-copy  Prochlorococcus  HLIPs 
are  in  a  separate  cluster  (Fig.  3).  While  bootstrap  support  for  these  two  broad  clusters  is  not  high, 
both  distance  and  maximum  likelihood  analyses  resulted  in  the  same  two  broad  groupings, 
lending  some  support  to  this  tree  architecture.  When  we  add  the  phage  HLIPs  to  this  analysis 
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some  interesting  patterns  appear.  Ten  out  of  eleven  of  the  phage  HLEPs  cluster  with  those  that  are 
encoded  by  multiple  gene  copies  in  Prochlorococcus  —  some  with  more  bootstrap  support  than 
others.  That  they  do  not  group  with  those  from  freshwater  cyanobacteria,  nor  with  the  HLIP  types 
found  in  a  single  copy  in  marine  unicellular  cyanobacteria  receives  greater  bootstrap  support  (Fig. 
3).  These  results  were  obtained  from  three  different  analyses  (distance  and  maximum  likelihood 
phylogenetic  analyses  and  GeneRAGE  clustering).  Indeed  GeneRAGE  clusters  7  out  of  1 1  phage 
HLIPs  with  the  four  HLEP  types  encoded  by  multi-copy  genes  in  Prochlorococcus  genomes  (GR 
10,  GR12,  GR  14,  and  GR  15),  with  the  remaining  4  of  indeterminate  affiliation  using 
GeneRAGE.  As  for  nearly  all  of  the  multi-copy  HLIP  sequences  from  Prochlorococcus  (28  of 
29),  all  but  one  of  the  phage  HLIPs  contain  a  9  amino  acid  signature  sequence  at  the  carboxy 
terminus  of  the  protein  that  is  absent  from  other  cyanobacterial  HLIPs  (26)  —  further  supporting 
a  connection  between  phage  hli  genes  and  multi-copy  hli  genes  in  the  host. 

While  the  lack  of  strong  bootstrap  support  for  most  of  the  clustering  patterns  in  Fig.  3  makes  it 
impossible  to  draw  definitive  conclusions,  the  fact  that  both  phage  and  Prochlorococcus  HLIPs 
co-occur  in  four  different  clusters  suggests  that  it  is  likely  that  hli  genes  have  been  transferred 
between  host  and  phage  multiple  times.  Moreover,  the  clustering  of  phage  HLEPs  with  a  subset  of 
the  HLIPs  that  are  found  exclusively  in  Prochlorococcus  suggests  that  these  distinct  hli  gene 
types  may  have  been  reacquired  from  phage  after  a  period  of  evolution,  leading  to  the  expansion 
of  the  hli  multigene  family  in  this  genus. 

Discussion 

Our  findings,  along  with  those  in  the  companion  paper  (33)  indicate  that  the  presence  of 
photosynthesis  genes  is  widespread  among  phage  that  infect  both  Prochlorococcus  and 
Synechococcus.  Though  they  are  not  universal  (13),  they  are  found  in  representatives  of  both  the 
Myoviridae  and  Podoviridae.  The  PSII  core  reaction  center  gene,  psbA,  has  been  found  in  all 
phage  reported  to  have  photosynthesis  genes,  suggesting  that  it  plays  a  particularly  significant 
role  in  these  phage.  Other  photosynthesis  genes  were  more  sporadically  distributed  among  the 
phage  genomes  that  have  been  analyzed  to  date.  Genes  encoding  HLEPs  were  found  in  all  three 
Prochlorococcus  phage  we  analyzed,  but  only  one  of  five  Synechococcus  phage  reported  in 
Millard  et  al.  (33).  In  contrast,  the  second  PSII  core  reaction  center  gen &,psbD,  was  found  in  all 
Synechococcus  phage  but  only  one  Prochlorococcus  phage.  The  small  number  of  phage  genomes 
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presently  available  for  analysis  precludes  making  robust  conclusions  from  this  asymmetry,  but  if 
the  trend  holds  up  as  more  genomes  are  sequenced,  it  is  likely  that  there  is  a  differential  benefit  of 
these  two  genes  for  the  phage  that  is  largely  influenced  by  genera  level  attributes  of  their 
cyanobacterial  host. 

The  photosynthetic  electron  transport  genes  were  found  in  only  one  Prochlorococcus  phage  and 
no  Synechococcus  phage,  whereas  a  transaldolase  gene  has  been  found  in  both  Prochlorococcus 
myoviruses  (MBS,  FR,  SWC  unpubl  data)  and  one  Synechococcus  phage  (33).  Until  more  phage 
genome  sequences  are  available,  it  is  not  possible  to  determine  the  significance  of  what  appears  to 
be  a  sporadic  distribution  of  these  host  genes  among  phage.  However,  assuming  that  the  genes  are 
functional,  this  scattered  distribution  may  have  arisen  from  differential  gain  and  loss  resulting 
from  trade-offs  between  the  burden  of  carrying  such  genes  and  their  utility  during  infection. 
Alternatively,  we  may  simply  be  observing  the  transient  passage  of  host  genes  through  the  phage 
genome  pool. 

The  arrangement  of  photosynthesis  genes  in  both  the  Prochlorococcus  and  Synechococcus  phage 
have  some  similar  properties  (compare  Fig.  1  this  study  and  (33)),  yet  are  distinctly  different  from 
their  organization  in  host  genomes  (20-22).  In  both  phage  we  find  adjacent  psbA  and  psbD  genes, 
adjacent  hli  and  psbA  genes,  and  psbA  adjacent  to  a  T4-like  phage  gene  encoding  gp49.  Yet, 
phylogenetic  analyses  show  that  the  proteins  encoded  by  psbA  and  psbD  from  Prochlorococcus 
phage  cluster  with  those  from  Prochlorococcus ,  and  in  at  least  the  one  Synechococcus  phage 
available  for  analysis,  these  proteins  cluster  with  those  from  Synechococcus  (Fig.  2A,  2B).  One 
likely  explanation  for  these  findings  is  that  the  genes  were  acquired  from  their  respective  hosts  in 
separate  transfer  events,  integrating  at  recombination  hot-spots  within  the  phage  genome,  and 
forming  gene  arrangements  that  may  be  advantageous.  Alternatively,  one  early  transfer  event  may 
have  occurred,  and  the  observed  gene  organization  patterns  formed  prior  to  the  divergence  of 
these  phage.  In  this  latter  case,  for  gene  sequences  to  be  similar  to  that  from  their  respective  hosts, 
they  would  have  had  to  have  been  swapped  between  phage  and  host  in  a  process  similar  to  gene 
conversion,  whereby  one  gene  is  replaced  by  another  in  a  non-reciprocal  fashion.  The  direction  of 
this  gene  conversion  is  most  likely  with  the  host  gene  replacing  the  phage  gene,  as  cyanobacterial 
phylogenies  inferred  from  psbA  and  psbD  gene  products  are  congruent  with  those  from  other 
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genes  (Fig.  2A,  2B,  (16-19)).  This  latter  scenario  would  suggest  that  encoding  PSII  reaction 
center  genes  similar  to  that  from  the  host  is  advantageous. 

The  presence  of  highly  conserved  PSII  reaction  center  and  hli  genes  in  the  3  Prochlorococcus 
phage  suggests  that  strong  evolutionary  pressure  has  driven  their  acquisition  and  retention.  This  is 
liable  to  have  important  implications  for  phage-host  interactions  during  infection.  It  has  been 
known  for  some  time  that  viral  infection  of  many  photosynthetic  organisms  leads  to  a  decline  in 
photosynthetic  rates  soon  after  infection  (34,  35).  This  is  attributed  to  damage  to  the  PSII 
membrane-protein  complexes  (36,  37)  and  may  be  due  to  oxidative  stress  caused  by  an  increase 
in  destructive  reactive  oxygen  species  subsequent  to  infection  (37).  However,  in  many  phage- 
infected  unicellular  cyanobacteria,  the  production  of  phage  progeny  is  dependent  upon 
photosynthetic  activity  continuing  until  just  prior  to  lysis  (38,  39).  Phage  PSII  reaction  center 
proteins  may,  if  expressed,  prevent  photoinhibitory  damage  to  PSII  in  Synechococcus  (12).  We 
further  suggest  that  expression  of  phage  PSII  reaction  center  proteins,  as  well  as  the 
photoprotective  HLIPs  may  help  maintain  photosynthetic  activity  during  infection  of 
Prochlorococcus,  leading  to  increased  phage  fitness  and  resulting  in  selection  for  cyanophage 
that  encode  functional  photosynthetic  genes. 

Our  analysis  of  host  genes  in  phage  have  implications  not  only  for  phage  fitness,  but  also  for  the 
evolution  of  the  hosts,  as  there  is  suggestive  evidence  that  phage  may  have  mediated  horizontal 
gene  transfer  and  hence  expansion  of  the  hli  multigene  family  in  the  hosts.  It  has  recently  been 
suggested  that  widely  distributed,  single  copy  genes  are  resistant  to  horizontal  transfer  (40),  while 
sporadically  distributed  multicopy  genes  are  those  most  likely  to  have  been  dispersed  by  this 
method  (40, 41).  The  clustering  patterns  displayed  by  the  hli  genes  in  our  analyses,  though  not 
statistically  robust,  are  consistent  with  this  tenant.  Each  of  the  single  copy  hli  gene  types  that  are 
common  to  the  four  sequenced  unicellular  marine  cyanobacteria  (20-22),  are  likely  to  have  been 
vertically  transferred  as  is  evident  from  the  conserved  gene  arrangement  surrounding  these  hli 
types  (26)  and  from  their  clustering  to  those  from  the  other  marine  unicellular  cyanobacteria 
((26),  Fig.  3).  In  contrast,  the  hli  gene  types  that  are  present  in  multiple  copies  per  genome  are 
found  in  only  some  Prochlorococcus  genomes.  It  is  these  latter  hli  gene  types  that  are  found  in 
the  Prochlorococcus  phage  genomes,  with  at  least  one  phage  hli  gene  in  each  of  the  4  clusters  of 
multi-copy  Prochlorococcus  hli  gene  types  (Fig.  3).  We  therefore  suggest  that  phage  may  have 
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mediated  the  horizontal  dispersal  of  these  multi-copy  genes  among  Prochlorococcus. 

Interestingly,  none  of  the  3  Prochlorococcus  phage  carrying  these  hli  genes  infect  marine 
Synechococcus  (15),  suggesting  that  if  these  genes  were  laterally  transferred,  the  breadth  of  that 
transfer  matches  the  limits  of  their  host  infection  capabilities. 

The  presence  of  numerous  hli  genes  in  Prochlorococcus  MED4,  a  high-light  adapted  ecotype,  is 
likely  to  have  facilitated  its  inhabitance  of  the  surface  waters  of  the  open  oceans  (20,  26.  42). 
Indeed  cyanobacteria  with  multiple  hli  genes  have  been  shown  to  have  a  competitive  advantage 
upon  shifts  to  high  light  over  mutants  in  which  some  of  these  gene  copies  have  been  inactivated 
(30).  Our  hypothesized  phage-mediated  expansion  of  the  hli  multigene  family  may  have 
facilitated  the  specialization  of  Prochlorococcus  to  high-irradiance  surface  ocean  waters  leading 
to  their  dominance  over  other  photosynthetic  organisms  in  these  environments.  Other 
photosynthetic  genes  found  in  phage  are  also  present  in  multiple  copies  in  many  cyanobacteria, 
including  psbA,  psbD  and  petF  (Suppl.  Table  IV).  The  importance  of  gene  duplication  in  the 
evolution  of  new  gene  functions  is  well  recognized  in  other  systems  (43, 44)  and  thus  it  would  not 
be  surprising  if  it  were  playing  a  role  in  the  evolution  of  physiological  variants  within  the 
Prochlorococcus  cluster. 

The  exchange  of  photosynthetic  genes  between  Prochlorococcus  and  the  phage  that  infect  them 
could  have  significant  implications  for  the  evolutionary  trajectory  of  both  host  and  phage,  and 
may  represent  a  more  general  phenomenon  of  metabolic  facilitation  of  key  host  processes.  That 
is,  the  selection  of  host  genes  that  are  retained  in  a  particular  phage  could  reflect  key  selective 
forces  in  the  host  environment.  Indeed,  phosphate  sensing  and  acquisition  genes  have  been  found 
in  phage  that  infect  organisms  in  low  phosphate  environments  (8,  9).  Might  we  also  find  salt 
tolerance  genes  in  phage  infecting  halotolerant  organisms,  and  thermal  tolerance  genes  in  phage 
that  infect  thermophilic  organisms?  Such  coupled  evolutionary  processes  in  host  and  phage,  if 
widespread,  may  play  a  role  in  defining  host  ranges  for  phage  and  niche  space  for  hosts,  leading 
to  specialization  and  even  speciation. 
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Figure  legends 

Fig.  1.  Arrangement  of  photosynthesis  genes  in  three  Prochlorococcus  phage  genomes.  (A) 
Podovirus  P-SSP7,  (B)  Myovirus  P-SSM2,  (C)  Myovirus  P-SSM4.  Solid  bars  indicate  genes 
related  to  photosynthesis,  hatched  bars  indicate  genes  commonly  found  in  phage,  and  open  bars 
indicate  predicted  open  reading  frames  of  unknown  function.  Gene=protein  designations: 
psbA=Dl,  psbD=D2,  Wi=HLIP,  per£=plastocyanin,  pefF=ferredoxin,  S=T7-like  head-to-tail 
connector,  9=T7-like  portal  protein,  70=T7-like  capsid  protein,  nr<7B=T4-like  ribonucleotide 
reductase  beta  subunit,  49=T4-like  restriction  endonuclease  VII,  n7=T4-like  thymidylate 
synthetase. 

Fig.  2.  Distance  trees  of  PSII  core  reaction  center  proteins  (A)  D1  (psbA ),  and  (B)  D2  ( psbD ). 
Phage  sequences  are  shown  in  bold.  The  host  strain(s)  that  each  phage  infects  are  indicated  with 
black  squares.  Trees  were  generated  from  partial  amino  acid  sequences  with  244  and  336  amino 
acids  included  in  the  analyses  for  D1  and  D2  respectively  (see  Suppl.  Figs.  4  and  5).  Bootstrap 
values  for  distance  and  maximum  likelihood  analyses,  and  quartet  puzzling  values  for  maximum 
likelihood  analysis,  greater  than  50%,  are  shown  to  the  left  of  the  nodes  (distance/maximum 
likelihood/maximum  parsimony).  Trees  were  rooted  with  genes  from  Arabidopsis  thaliana. 
Essentially  the  same  topology  was  obtained  when  nucleotide  trees  (with  3rd  position  excluded) 
were  constructed  except  for  psbA  from  P-SSP7,  which  clustered  with  HL  Prochlorococcus  in 
nucleotide  trees,  albeit  with  low  bootstrap  support.  Pro  =  Prochlorococcus,  Syn  = 

Synechococcus,  Anab  =  Anabaena,  Syncy  =  Synechocystis. 

Fig.  3.  Distance  tree  of  80  HLIPs  from  cyanobacteria  and  phage.  Phage  HLIPs  appear  in  bold. 

The  tree  was  generated  from  partial  amino  acid  sequences  (the  36  amino  acid  region  included  in 
analyses  is  indicated  in  Suppl.  Fig.  8)  and  gaps  were  treated  as  missing  data.  GeneRAGE  clusters 
are  indicated  to  the  right  of  the  tree  (GR  C#),  with  cluster  designations  following  (26). 
Discrepancies  between  GeneRAGE  and  distance  tree  clustering  were  found  for  3  HLIPs  and  are 
indicated  by  the  dashed  line  and  their  GR  cluster  designations.  Asterisks  denote  proteins  encoding 
at  least  6  out  of  9  of  the  C-terminal  9  amino  acid  consensus  sequence.  Bootstrap  and  quartet 
puzzling  values  greater  than  50%  are  shown  to  the  left  of  the  nodes  for  distance  and  maximum 
likelihood  analyses  respectively.  The  tree  was  rooted  with  the  single  HLIP  from  Arabidopsis 
thaliana.  Abbreviations  as  for  Fig.  2. 
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Table  I:  Phage  used  in  this  study  and  their  photosynthesis-related  genes.  Phage  family  and  host-range 
information  as  per  (15).  Bold  indicates  the  host  the  phage  was  isolated  on.  *From  Mann  et  al.  (12). 


Phage 

Family 

Host  Strains  Infected 

Genes 

P-SSP7 

Podovirus 

Pro  MED4  (HL) 

psbA ,  1  hli 
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Myovirus 

Pro  NATL1A,  NATL2A, 
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psbA ,  6  hli  genes, 
petF,  petE 
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freshwater  and  single-copy  marine  cyanobacterial  HLlPs  phage  and  multi-copy  Prochlrococcus  HUPs 


Suppl.  Table  II:  Range  of  nucleotide  and  amino  acid  percent  identities  for  pairs  of  phage  and 
Prochlorococus  genes  and  among  Prochlorococcus  genes.  Prochlorococcus  strains  MED4,  MIT9313  and 
SSI 20  were  used  for  these  analyses  with  only  one  copy  of  identical  genes  within  a  single  genome  being 
included. 


Among  Prochlorococcus  genes 

Phage  to  Prochlorococcus  genes 

Gene 

nucleotide 

amino  acid 

nucleotide 

amino  acid 

psbA 

74-81  % 

87  -96  % 

69  -  83  % 

85  -  93  % 

psbD 

71-80% 

85  -  93  % 

70  -  78  % 

83  -  95  % 

petE 

69-71  % 

74-76% 

47  -  54  % 

41  -47  % 

petF 

73-78  % 

91-94% 

48-57% 

51-53% 

hli  cluster  10 

58  -  86  % 

50-91  % 

51-63% 

43  -  64  % 

hli  cluster  12 

53-73  % 

56-75% 

58  -  70  % 

63-71  % 

hli  cluster  14 

71  -94% 

77-91  % 

73  -  85  % 

74  -  94  % 

hli  cluster  15 

64  -  70% 

59  -  66  % 

56-66% 

54-64% 

Suppl.  Table  III:  Divergence  estimates  for  Prochlorococcus  gene  pairs  (using  strains  MED4,  MIT9313  and 
SS120 )  and  phage-  and  Prochlorococcus-encodedgenes.  Ks  =  the  number  of  nucleotide  substitutions  that 
do  not  cause  an  amino  acid  change  per  synonymous  site,  and  Ka  =  the  number  of  nucleotide  substitutions 
that  cause  an  amino  acid  change  per  non-synonymous  site.  For  the  petF  and  hli  genes,  where  multiple  gene 
types  are  found,  estimates  are  for  genes  clustered  in  the  same  group  as  determined  from  GeneRAGE 


analyses.  Note  that  only  one  identical  hli  gene  from  the  same  genome  was  included  in  the  analyses.  Ranges 
are  provided  for  gene  pairs  for  which  the  Kimura  2p  correction  was  applicable. _ 


Gene 

Protein 

Among  Prochlorococcus 

Phage  to  Prochlorococcus 

Ks 

Ka/Ks 

Ks 

Ka/Ks 

psbA 

D1 

1.47-2.14 

0.03  -  0.05 

0.93-3.11 

0.03-0:07 

psbD 

D2 

1.30-  1.95 

0.04  -  0.05 

2.24 

0.02 

pelE 

plastocyanin 

1.37-1.85 

0.06-0.07 

1.13-2.42 

0.20-0.40 

petF 

ferredoxin 

1.88 

0.03 

1.08-2.65 

0.19-0.43 

hli  clO 

HLIP  CIO 

0.59-1.59 

0.09-0.31 

0.99-1.80 

0.14-0.39 

hli  cl4 

HLIP  C14 

0.31  -  1.96 

0.00-0.18 

0.65-1.59 

0.05  -  0.24 

hli  cl5 

HLIP  C15 

0.82 

0.21 

1.64 

0.13 

Suppl.  Table  IV:  Number  of  copies  of  the  relevant  photosynthesis  genes  found  in  cyanobacteria  and 
Prochlorococcus  phage. _ _ _ _ _ _ 


psbA 

psbD 

petE 

petF 

hli 

Prochlorococcus  MED4 

1 

1 

1 

3 

22 

Prochlorococcus  MIT9313 

2 

1 

i 

2 

9 

Prochlorococcus  SSI 20 

1 

1 

1 

2 

13 

Synechococcus  WH8102 

4 

2 

1 

4 

8 

Synechocystis  PCC6803 

3 

2 

1 

4 

4 

Anabaena  PCC7120 

5 

2 

i 

3 

8 

Podovirus  P-SSP7 

1 

0 

0 

0 

1 

Myovirus  P-SSM2 

1 

0 

1 

1 

6 

Myovirus  P-SSM4 

1 

1 

0 

0 

4 
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Supplementary  Figure  Legends 

Suppl.  Fig.  4.  Alignment  of  D1  amino  acid  sequences  deduced  from  phage  and  cellular  encoded 
psbA  genes.  Note  the  additional  7  amino  acids  towards  the  C-terminus  of  the  protein  from 
cyanobacterial  and  phage  D1  proteins.  The  amino  acids  used  in  phylogenetic  analyses  correspond 
to  the  region  extending  from  position  84  (VPSS)  to  position  327  (RANL)  of  the  protein  from 
Prochlorococcus  MED4.  Pro  =  Prochlorococcus,  Syn  =  Synechococcus,  Ana  =  Anabaena,  Syncy 
=  Synechocystis. 

Suppl.  Fig.  5.  Alignment  of  D2  amino  acid  sequences  deduced  from  phage  and  cellular  encoded 
psbD  gene  sequences.  Note  the  7  amino  acid  indel  in  the  D2  protein  from  Prochlorococcus 
MED4  and  SS120  and  Prochlorococcus  phage  P-SSM4.  These  additional  7  amino  acids  are  not 
found  in  the  D2  proteins  from  other  cyanobacteria  nor  the  Synechococcus  phage  S-PM2.  The 
amino  acids  used  in  phylogenetic  analyses  correspond  to  the  region  extending  from  position  14 
(FDVL)  to  position  358  (GNAL)  of  the  protein  from  Prochlorococcus  MED4.  Pro  = 
Prochlorococcus,  Syn  =  Synechococcus,  Ana  =  Anabaena,  Syncy  =  Synechocystis. 

Suppl.  Fig.  6.  Alignment  of  plastocyanin  amino  acid  sequences  deduced  from  phage  and  cellular 
encoded  petE  gene  sequences.  Note  the  10  amino  acid  indel  in  the  plastocyanin  protein  from 
unicellular  cyanobacteria  and  Prochlorococcus  phage  P-SSM2.  The  amino  acids  used  in 
phylogenetic  analyses  correspond  to  the  region  extending  from  position  36  (GMLA)  to  position 
116  (VIVE)  of  the  protein  from  Prochlorococcus  MED4.  Pro  =  Prochlorococcus,  Syn  = 
Synechococcus,  Ana  =  Anabaena,  Syncy  =  Synechocystis,  Tricho  =  Trichodesmium. 

Suppl.  Fig.  7.  Alignment  of  ferredoxin  amino  acid  sequences  deduced  from  phage  and  cellular 
petF  gene  sequences.  Pro  =  Prochlorococcus,  Syn  =  Synechococcus,  Ana  —  Anabaena,  Syncy  = 
Synechocystis. 

Suppl.  Fig.  8.  Alignment  of  HLIP  amino  acid  sequences  deduced  from  cyanophage  and  cellular 
encoded  hli  genes.  The  motifs  used  for  defining  HLIPs  (AExxNGRxAMIGF),  as  well  as  the 
region  of  the  C-terminal  consensus  sequence  (TGQIIPGI/FF)  found  in  28  out  of  44 
Prochlorococcus  HLIPs  and  10  out  of  1 1  Prochlorococcus  phage  HLIPs,  are  indicated  below  the 
sequences.  The  hli  gene  number  designation  is  shown  after  the  strain  ED.  The  amino  acids  used  in 
phylogenetic  analysis  extend  from  4  amino  acids  towards  the  N-terminus  of  the 
AExxNGRxAMIGF  motif  to  the  last  amino  acid  of  the  C-terminal  motif.  Pro  =  Prochlorococcus, 
Syn  =  Synechococcus,  Ana  =  Anabaena,  Syncy  =  Synechocystis. 

Suppl.  Fig.  9.  Distance  tree  of  the  plastocyanin  photosynthetic  electron  transport  protein  encoded 
by  the  petE  gene.  The  phage  protein  is  shown  in  bold.  The  tree  was  generated  from  partial  amino 
acid  sequences.  The  80  aa  region  included  in  analyses  is  shown  in  Suppl  Fig.  6.  Bootstrap  values 
(for  distance  and  maximum  parsimony  analyses)  and  quartet  puzzling  support  (for  maximum 
likelihood  analysis)  greater  than  50%  are  shown  to  the  left  of  the  nodes  (distance/maximum 
likelihood/maximum  parsimony).  The  tree  was  rooted  with  the  gene  from  Arabidopsis  thaliana. 
Cyanobacterial  genus  abbreviations:  Pro  =  Prochlorococcus,  Syn  =  Synechococcus,  Anab  = 
Anabaena,  Syncy  =  Synechocystis,  Tricho  =  Trichodesmium. 


73 


ProMED4 

ProSS120 

ProMIT9313 

SynWH8102  1 

SynWH8102_2 

SynWH7803 

SyncyPCC6803_l 

SyncyPCC6  80  3  _2 

P-SSP7 

P-SSM2 

P-SSM4 

S-PM2 

leamays 

Arabidopsis 

Chlorella 

AnaPCC7120_2 

AnaPCC7120 

An*PCC7120_l 

OynncxSimum 

Prorocantrua 

Clustal  Consan 

ProMED4 

ProSS120 

ProMIT9313 

SynWH8102_l 

SynWH8102_2 

SynWH7803 

SyncyPCC6803_l 

SyncyPCC6803_2 

P-SSP7 

P-SSM2 

P-SSM4 

S-PK2 

Zaa  mays 

Arabidopsis 

Chloral la 

AnaPCC7120_2 

AnaPCC7120 

AnaPCC7 120_1 

Cymnodi.ni.ua 

Prorocantrum 

Clustal  Conaan 

ProMED4 

ProSS120 

ProKI T9 313 

SynWH8102_l 

SynWH8102  2 

3ynWH7803 

SyncyPCC6803_l 

SyncyPCC6803_2 

P-SSP7 

P-SSM2 

P-SSM4 

S-PM2 

Zaajsays 

Arabidopsis 

Chloral la 

AnaPCC7120_2 

AnaPCC7 120 

AnaPCC7 12  0_1 

Gymnodinium 

Prorocant  rum 

Clustal  Consan 

ProMED4 

ProSS120 

ProMIT9313 

SynWH8102_l 

SynWH8102_2 

SynWH7803 

SyncyPCC6803_l 

SyncyPCC6803_2 

P-SSP7 

P-SSM2 

P-SSM4 

S-PM2 

Zeamays 

Arabidopsis 

Chloral la 

AnaPCC7120_2 

AnaPCC7120 

AnaPCC7120_l 

Cymnodinium 

Prorocantrum 

Clustal  Consan 


Suppl. 


74 


SynPCC7942_l  1 
SynPCC7942_2  1 
SyncyPCC6803_l  1 
SyncyPCC6803_2  1 
AnaPCC7120_l  1 
AnaPCC7120_2  1 
Spinacia  1 
Arabidopsis  1 
Zaa  1 
Chloral la  1 
ProMED4  1 
P-SSM4  1 
ProSS120  1 
ProMXT9313  1 
SynWH8102  1  1 
SynWH8102_2  1 
S-PM2  1 
Cluatal  Conaan  1 


SynPCC7942_l 

SynPCC7942_2 

SyncyPCC6803_l 

SyncyPCC6803_2 

AnaPCC7120_l 

AnaFCC7120_2 

Spinacia 

Arabidopsis 

Zaa 

Chloralla 

ProMED4 

P-SSM4 

ProSS120 

ProMZT9313 

SynWH8102_l 

SynWH8102_2 

S-PM2 

Cluatal  Conaan 


98 

98 

98 

98 
97 

97 

99 
99 
99 

98 
97 
97 
97 
97 
97 
97 

99 
76 


197 

197 

197 

197 
196 

196 

198 
198 
198 

197 
196 
196 
196 
196 
196 
196 

198 
156 


SynPCC7942_l 

SynPCC7942_2 

SyncyPCC6803_l 

SyncyPCC6803_2 

AnaPCC7120_l 

AnaPCC7120_2 

Spinacia 

Arabidopsis 

Zaa 

Chloralla 

ProMEXM 

P-SSM4 

ProSS120 

ProMIT9313 

SynWH8102_l 

SynWH8102_2 

S-PM2 

Cluatal  Consan 


198 

198 

198 

198 
197 

197 

199 
199 
199 

198 
197 
197 
197 
197 
197 
197 

199 
157 


290 

290 

290 

290 
289 

289 

291 
291 
291 

290 
296 
296 
296 
289 
289 
289 

291 
237 


SynPCC7942_l 

SynPCC7942_2 

SyncyPCC6803_l 

SyncyPCC6803_2 

AnaPCC7120_l 

AnaPCC7l20_2 

Spinacia 

Arabidopsis 

Zaa 

Chloralla 

ProMED4 

P-SSM4 

ProSS120 

ProKZT9313 

SynWH8102_l 

SynWH8102_2 

S-PM2 

Cluatal  Consan 


291 

291 

291 

291 
290 

290 

292 
292 
292 

291 
297 
297 
297 
290 
290 
290 

292 
238 


352 

352 

352 

352 
351 

351 

353 
353 
353 

352 
358 
358 
358 
351 
351 
351 

353 
297 


Suppl  Fig.  5 


75 


P-SSM2  1 
TrichoIMSIOl  1 
AnaPCC7 120  1 
ProSS120  1 
ProMED4  1 
Pro9313  1 
SYNWH8102  1 
SynPCC7942  1 
SyncyPCC6  803  1 
Arabidopsis  1 
Spinacia  1 
Chlamydamonas  1 


Clustal  Consensus  1 


P-SSM2 

TrichoIMSIOl 

AnaPCC7120 

ProSS120 

ProMED4 

Pro9313 

SYNWH8102 

SynPCC7942 

SyncyPCC6803 

Arabidopsis 

Spinacia 

Chlaaydamonas 

Clustal  Consensus 


44 
52 
55 
4  9 
46 
4  9 
49 
55 
49 
91 
88 
67 
7 


114 

133 

139 

119 

116 

119 

119 

125 

126 
171 
168 
145 
34 


Suppl  Fig.  6 


76 


Spinach  1 
SynNH  8102  1  1 
SynPCC7002  1 
SynPCC6803_l  1 
Arabidopsis  1 
Chlasydcnonas  1 
AnaPCC7120_l  1 
AnaPCC7120_2  1 
ProSS120_l  1 
ProMED4_l  1 
SynWH  8 1 0  2  _  2  1 
ProHIT9313_l  1 
Cyanothaca  1 
Cyanophora  1 
Syn_alongatus  1 
P-SSM2  1 
SynPCC6803_2  1 
SynWH8102  3  1 
ProMIT9313_2  1 
ProSS120  2  1 
ProMED4  2  1 
8ynPCC6803_3  1 
AnaPCC7120_3  1 
SynWH8102_4  1 
ProMED4  3  1 
SynPCC6803_4  1 
AnaPCC7120_4  1 


Cluatal  Conaan  1 


96 

44 

47 

47 
98 
76 
49 
49 
49 
49 
49 
49 
49 

49 

48 
47 

50 
50 
50 

49 

50 
50 
50 
50 

47 

48 
48 
9 


Spinach 

SynWH8102_l 

SynPCC7002 

SynPCC6803_l 

Axabidopsis 

Chi  aatydonona  s 

AnaPCC7 1201 

AnaPCC7 120_2 

ProSS120_l 

ProMED4  1 

SynWH8102_2 

ProKIT9 313^1 

Cyanothaca 

Cyanophora 

Syn_alongatus 

P-SSK2 

SynPCC6803_2 
SynWH8102  3 
ProMIT9  3 13  _2 
ProSS120  2 
ProMED4  2 
SynPCC6803_3 
AnaPCC7 120_3 
SynWH8102_4 
ProKED4_3 
SynPCC6803_4 
AnaPCC7120_4 
Cluatal  Conaan 


Suppl.  Fig.  7 


77 


ProXIT9313_02  1 
SynWH8 10206  1 
ProKEDiO  1  1 
ProSS120_04  1 
SynWHfl 1 0  2  _  0 1  1 
SynWH8102_05  1 
Cuillardia  1 

ProSS120_06  1 
ProSSX20_03  1 
P-SSX2  06  1 
ProKZD4  20  1 
ProSS120_09  1 
ProXIT9313_01  1 
SynWXS 10208  1 


An abPCC7 12005  1 
AnabPCC7120_06  1 
An«bPCC7 120.01  1 
AnabPCC7 120.04  1 
AaabPCC7120_08  1 
An*bPCC7 120.07  1 
SyneyPCC4803_C  1 
Cyanophora  1 
Arabidopaia  1 
Porphyra  1 
ProSS120_03  1 
SynWB8102  02  1 
ProXIT9313_04  1 

Syn<ryPCC6  803.D  1 
AnabPCC7 120.03  1 
ProXIT9313_03  1 
SynWH8 102  03  1 
ProS*120_02  1 
ProMZM  0  3  1 
SynWHB 10204  1 
SynPCC7942_A  1 
SyncyPCC4803_A  1 
SyncyPCC4803_B  1 
AnabPCC7 120.02  1 
PtoXZD4  04  1 
P-*S»7_01  1 

ProXXD4_06  1 
ProMXD4_16  1 
ProSS120  08  1 
ProSS120.il  1 
P-SSX2.03  1 
P-SSM4  01  1 
ProSS120_ 12  1 
ProNZD4  08  l 
ProMBH  18  1 
ProMZD4  21  1 
ProKZD4_05  1 
P-SSX4.03  1 
ProXXD4.ll  1 
ProXXT93 13.08  1 
ProXXD4_10  1 
ProXXD4_07  1 
ProKXD4_17  1 
ProKED4  14  1 
ProSS120_07  1 
P-SSX2.01  1 
ProXIT9313_09  1 
ProXIT9313_04  1 
P-SSM4  04  1 
ProSS120.10  1 
P-SSX2.0S  1 
P-SSM402  1 
ProKTT9313_03  1 
ProXIT9313_07  1 
ProSS120_13  1 
P-SSK2  04  1 
ProMXD4_09  1 
ProXXD4_19  1 
ProXXD4_22  1 
P-SSX2.02  1 
ProKXD4.15  1 
ProSS120_01  1 
SynWH8 102.07  1 
Cluatal  Conaan  1 


rQLKRGNQRSRVTVLYRS  I E I PMANSNDNWFQTI 
:  PCKIVLSKEEI  SNREKKLKILAEKWTO§RLEKE 

_  -jjjr- 

-I 

-M 

- - m 

■  - - METRSTTCLPKVA 

- MQTRPSTDLPPVA 

- - melyptdkte 

— ---mtBttki 

- msgfkj 

IGRRQNLCFNRK< 

'SNNPELSKVESSKS 

.PGDQDLPSEQAVFBGSSQGSESSEVQPP 
■ - -MTSNQEONNQEAMELSKTNSEEIK 

JTPSTDAPVIRGATVTT 

■  . . MRSGRTV 

. . MTTRGP 

• . MTSRGF 

- MRTNNA 

- MKTKDLDTLLENEYAYEPI 


VDGYAACRTQFLPFCF  118 


i:' 


:  i  o 


MSNSS 


MTSSAQAQI 


MTSSTNVITEDGGRQBKYJ 


MTPKYKSAFTVTESGGROKMY^ 


■MSPLTGFIIWIAITLQFTLYTIKRLQEPLDPNLFOSQKS 


Suppl.  Fig.  8 


78 


65/W-- 


96/92/66 


79/56/- 


100/100/93 


100/100/99 


-365/- 


33/76/72 


59/-361 


-Prochlorococcus  MED4 
—  Synec/?ococa/s  WH8102 


-Prochlorococcus  SSI  20 
-ProcWorococa/s  MIT9313 
-Synechococcus  PCC7942 
— Synechocystis  PCC6803 

- Myovirus  P-SSM2 

- Trichodesmium  IMS101 


-Anabaena  PCC7120 


-Chlamydomonas 


■  Spinacia  oleracea 


■Arabidopsis  thaliana 


Suppl.  Fig.  9 


79 


80 


Chapter  V 

Prochlorococcus  cyanophage  genomes: 

Phages  adapted  for  infection  of  open-ocean  photosynthetic  cells 


81 


Prochlorococcus  cyanophage  genomes: 

Phages  adapted  for  infection  of  open-ocean  photosynthetic  cells 

Matthew  B.  Sullivan  et  al.1 

ABSTRACT 

Phage  genomic  analyses  suggest  that  dsDNA  phage  evolve  through  the  exchange  of 
modular  cassettes  of  genes,  perhaps  forming  a  limited  number  of  phage  types  which  contain 
unique  genes  that  are  specialized  for  interactions  with  particular  hosts.  Here  we  present  an 
analysis  of  three  genomes  from  phages  that  infect  Prochlorococcus,  the  numerically  dominant 
primary  producer  in  the  vast  oligotrophic  surface  oceans.  These  phage  genomes  contain  core 
genes  that  suggest  that  podovirus  P-SSP7  is  a  T7-like  phage,  while  both  myoviruses  P-SSM2  and 
P-SSM4  are  T4-like  phages.  They  thus  appear  to  be  variations  of  two  well-known  phages,  but 
have  been  modified  for  the  infection  of  photosynthetic  hosts  living  in  oligotrophic  environments. 
All  three  phage  genomes  encode  core  photosynthetic  proteins  that  are  full-length,  conserved,  and 
clustered  in  the  genome,  features  which  suggest  they  are  being  maintained  by  selection.  The  P- 
SSP7  genome  also  contains  an  integrase  gene  that  suggests  this  phage  is  capable  of  integrating 
into  its  host.  If  this  gene  is  functional,  this  would  not  only  have  evolutionary  and  ecological 
implications  for  both  phage  and  host,  but  would  also  represent  the  first  temperate  T7-like  phage 
and  the  first  temperate  marine  cyanophage  in  culture.  In  addition,  the  P-SSM2  and  P-SSM4 
genomes  collectively  encode  proteins  that  may  be  important  for  allowing  the  phage  and/or  host  to 
respond  to  phosphate  stress,  as  well  as  mobilize  carbon  stores  and  synthesize  nucleotides, 
lipopolysaccharides,  cobalamin  and  accessory  pigments.  We  hypothesize  that  many  of  these 
phage-encoded  genes  are  functional  and  likely  aid  in  utilizing  host  metabolic  capabilities  during 
infection. 


1  This  chapter  is  the  draft  of  a  manuscript  that  includes  the  following  co-authors  (in  alphabetical  order): 
Jan-Fang  Chang,  Sallie  W.  Chisholm,  Maureen  Coleman,  Jane  Grimwood,  David  Mead,  Forest  Rohwer, 
Peter  Weigele 
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INTRODUCTION 


The  taxonomy  of  bacterial  viruses  (phages)  is  highly  controversial  and  actively  debated 
(Lawrence,  Hatfull,  and  Hendrix,  2002;  Rohwer  and  Edwards,  2002).  This  controversy  arises  as 
the  result  of  observations  among  the  ‘lambdoid’  phages  (those  phages  that  are  able  to  recombine 
productively  with  coliphage  lambda)  that  their  DNA  exhibits  regions  of  high  sequence  similarity 
adjacent  to  regions  of  low  sequence  similarity  which  suggests  that  these  genomes  might  evolve 
through  the  horizontal  exchange  of  genetic  material  (Simon,  Davis,  and  Davidson,  1971). 

Genome  sequencing,  primarily  of  lambdoid  phages,  has  confirmed  these  observations  and  it  has 
been  hypothesized  that  perhaps  all  dsDNA  phages  primarily  evolve  through  the  horizontal 
exchange  of  genetic  material,  in  the  form  of  modular  functional  cassettes,  through  a  global  phage 
genome  pool  (termed  the  mosaic  theory  of  phage  evolution)(Hendrix  et  al.,  1999).  While  any 
given  phage  has  access  to  all  the  sequences  in  the  global  phage  genome  pool,  it  is  suggested  that 
access  to  this  global  phage  genome  pool  must  be  limited  in  some  manner  (Hendrix  et  al.,  1999), 
perhaps  by  host  range  (i.e.,  the  range  of  hosts  a  given  phage  can  infect)(Sullivan,  Waterbury,  and 
Chisholm,  2003). 

If  phage  had  access  to  a  global  phage  genome  pool,  then  one  would  predict  that  some 
phages  might  even  contain  genes  considered  to  be  characteristic  of  a  different  domain  of  life  (i.e., 
a  phage  that  infects  exclusively  bacteria  might  contain  a  eukaryotic  or  archaeal  gene,  not  present 
in  its  hosts )(Hendrix,  2002).  This  has  indeed  been  observed  among  the  genomes  of  the 
mycobacteriophage  (phages  that  infect  Mycobacteria)  (Pedulla  et  al.,  2003).  Further,  examples  of 
other  potential  gene  exchanges  between  phages  are  commonplace  in  the  current  literature, 
particularly  among  the  lambdoid  phages  which  are  relatively  well  represented  in  phage  genome 
databases,  but  also  among  two  other  relatively  well  represented  groups  of  phage  genomes:  the 
mycobacteriophage  (Pedulla  et  al.,  2003)  and  the  “dairy  phages”  (phages  that  infect  lactic  acid 
bacteria)  (Brussow,  2001).  However,  between  these  groups,  there  is  a  quantitative  difference  in 
horizontal  gene  exchange  with  the  lambdoid  phages  being  highly  mosaic  while  the  dairy  phages 
are  the  least  mosaic  (Brussow  and  Hendrix,  2002). 

Mechanistically,  the  setting  for  this  horizontal  exchange  implied  in  the  mosaic  theory  of 
evolution  requires  phage  DNA  to  be  in  close  proximity  with  DNA  from  a  foreign  source  to  allow 
genetic  exchange.  This  occurs  within  multiply-infected  hosts,  where  either  two  lytic  (incapable 
of  integration  into  the  host  chromosome)  phages  co-infect  the  same  host  or  where  a  lytic  phage 
infection  occurs  when  a  temperate  (capable  of  stable  integration  of  its  genome  into  the  host 
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chromosome)  phage  has  already  integrated  into  the  host  genome  (Hendrix,  2003).  In  either  of 
these  cases,  genes  could  be  shared  through  homologous  or  non-homologous  recombination 
(through  an  unknown  mechanism)  (Hendrix,  2003).  Further,  in  the  case  of  the  temperate  phage, 
another  opportunity  to  horizontally  obtain  genes  (in  this  case  from  the  host)  can  occur  through 
imprecise  excision  of  the  phage  genome  upon  induction  (Calendar,  1988).  Thus  through 
recombination  with  other  phages  and  hosts,  any  given  phage  has  the  potential  to  access  all  the 
sequences  in  the  global  phage  genome  pool.  This  gene  pool  includes  variants  of  not  only  the 
structural  and  developmental  genes  directly  necessary  for  phage  life  cycles,  but  also  those  genes 
that  have  been  copied  from  hosts  and  are  being  maintained  in  the  phage  gene  pool  (Hendrix  et  al., 
2000).  If  lytic  phage  genomes  have  lower  opportunity  for  recombination  than  the  temperate 
phages,  then  these  phage  genomes  might  be  expected  to  be  less  prone  to  horizontal  exchange  of 
genes  and  maybe  even  form  biologically  cohesive  units  that  could  allow  a  taxonomic  hierarchy  to 
apply  due  to  evolution  largely  by  speciation,  genetic  adaptation  and  accumulation  of  point 
mutations  (Kovalyova  and  Kropinski,  2003). 

While  the  genomes  of  lytic  phage  types  (e.g.,  phages  with  genomes  like  those  of  the 
coliphages  T7  and  T4)  are  underrepresented  in  the  phage  genome  databases,  analyses  of  these 
genomes  are  beginning  to  reveal  a  large  number  of  fixed,  essential  genes,  interspersed  with  highly 
variable,  non-essential  genes  which  do  not  appear  to  be  shared  across  groups  (Desplats  and 
Krisch,  2003;  Miller  et  al.,  2003a;  Molineux,  in  press).  However,  the  hosts  of  these  phages  are 
primarily  alpha,  gamma,  delta  proteobacteria  (only  one  cyanobacterial  host)  so  our  understanding 
of  core  genes  within  these  phage  types  could  be  broadened  with  sequence  information  from 
phages  infecting  a  broader  diversity  of  hosts  that  would  more  generally  represent  variations  upon 
the  basic  lytic  life  style. 

A  group  of  phages  that  are  ideal  for  such  focused  sequencing  efforts  are  the  marine 
phages.  They  are  abundant  (Bergh,  1989;  Bratbak  et  al.,  1990;  Proctor  and  Fuhrman,  1990),  are 
significant  contributors  to  global  biogeochemistry  (Fuhrman,  1999),  and  are  likely  to  include  a 
wide  diversity  of  types  for  comparison  to  the  nonmarine  phages  already  in  the  database  (Paul  et 
al.,  2002).  Although  there  are  over  170  phage  genomes  in  GenBank,  only  seven  of  these 
sequenced  phages  infect  marine  hosts  (cyanophage  P60;  vibriophages  VpV262,  KVP40,  VP16T, 
VP16C;  roseophage  SIOl;  Pseudoalteromonas  phage  PM2),  and  only  one  infects  cyanobacteria 
(cyanophage  P60).  The  genomes  of  cyanophage  P60  (Chen  and  Lu.  2002),  roseophage  SIO 1 
(Rohwer  et  al.,  2000)  and  vibriophage  VpV262  (Hardies  et  al.,  2003)  contain  genes  with  high 
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homology  to  nonmarine  phages,  but  also  open  reading  frames  (ORFs)  with  little  homology  to  any 
known  phage  or  host  genes  (Chen  and  Lu,  2002;  Hardies  et  al.,  2003;  Rohwer  et  al.,  2000), 
suggesting  some  support  of  the  mosaic  theory  but  also  significant  ambiguities,  which  will  only  be 
resolved  as  more  marine  phage  genomes  enter  the  databases.  Marine  phage  genomes  will  permit 
a  comparison  of  the  evolutionary  and  ecological  relationships  of  phages  infecting  marine  versus 
non-marine  hosts  where  different  pressures  may  shape  their  genomes  (e.g.,  differences  in 
dispersal  strategies,  acclimation  to  environmental  stressors). 

The  marine  cyanobacteria  Prochlorococcus  are  abundant  (to  105  cells  ml'1)  in  the 
oligotrophic  surface  oceans  (Partensky,  Hess,  and  Vaulot,  1999)  where  nutrients,  in  particular 
phosphate,  are  often  limiting  (Karl,  1999;  Wu  et  al.,  2000).  Genomic  analyses  of  two  marine 
phages,  roseophage  SIOl  (Rohwer  et  al.,  2000)  and  vibriophage  KVP40  (Miller  et  al.,  2003a), 
has  revealed  that  these  genomes  encode  a  phosphate-inducible  gene  (phoH).  Such  an  observation 
suggests  that  the  genetic  composition  of  these  phages  has  been  shaped  by  their  host’s 
environments.  Prochlorococcus  phages  are  abundant  in  these  regions  (~103  phage  ml'1) 

(Sullivan,  Waterbury,  and  Chisholm,  2003).  Work  with  phages  that  infect  the  closely  related 
cyanobacteria  Synechococcus  suggests  that  these  phages  play  a  small  but  significant  role  in  host 
mortality  (Mann,  2003)  while  phylogenetic  inference  from  Prochlorococcus  phage  and  host 
genomes  suggests  that  they  are  also  likely  to  regulate  population  dynamics  through  the  movement 
of  genetic  material  through  the  host  population  (Lindell  et  al.,  submitted).  Because  phage 
genomics  has  yielded  information  that  complements  what  is  known  about  the  ecology  of  marine 
viruses  and  their  hosts,  we  hoped  to  use  genomic  sequencing  to  further  our  understanding  of 
Prochlorococcus  and  their  phages. 

Here  we  present  analyses  of  three  genomes  from  Prochlorococcus  phages  that  were 
classified  using  electron  microscopy  as  belonging  to  two  morphological  families:  Podoviridae  for 
P-SSP7  and  Myoviridae  for  P-SSM2  and  P-SSM4  (Sullivan,  Waterbury,  and  Chisholm,  2003). 
The  podovirus  P-SSP7  infects  a  single  high-light  adapted  (HL)  Prochlorococcus  strain,  MED4, 
while  the  first  myovirus  P-SSM2  infects  3  low-light  adapted  (LL)  Prochlorococcus  strains  (MIT 
9211,  NATL1  A,  NATL2A)  and  the  other  myovirus  P-SSM4  infects  2  HL  and  2  LL 
Prochlorococcus  strains  (MED4,  MIT  9215,  NATL2A,  NATLlA)(Sullivan,  Waterbury,  and 
Chisholm,  2003).  None  of  these  phages  cross-infects  any  of  ten  Synechococcus  strains  tested  thus 
far.  We  had  no  prior  knowledge  of  the  gene  content  of  these  phage,  thus  with  regards  to  their 
genomes,  these  phage  were  selected  randomly  for  this  study.  At  the  genome  level,  these  phages 
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appear  in  some  ways  to  be  T7-  and  T4-like  phages,  but  are  genetically  adapted  for  infection  of 
photosynthetic  hosts  in  oligotrophic  environments. 

MATERIALS  AND  METHODS 

Preparation  of  cvanophage  for  genome  sequencing 

Phages  were  prepared  for  genomic  sequencing  as  previously  described  (Chapter 
4)(Lindell  et  al.,  submitted).  Briefly,  phage  particles  were  concentrated  from  phage  lysates  using 
PEG,  then  DNA-containing  phage  particles  were  purified  from  other  material  in  phage  lysates 
using  a  density  cesium  chloride  gradient.  These  purified  phage  particles  were  broken  open 
(SDS/proteinase  K)  and  DNA  was  extracted  (phenokchloroform)  and  precipitated  (ethanol)  to 
provide  small  amounts  of  DNA  (<1  pg).  A  custom  LASL  clone  library  was  constructed  by 
Lucigen  Inc  (Middletown,  WI)  as  described  previously  (Breitbart  et  al.,  2002).  Subsequent  clone 
libraries  were  constructed  by  DOE  JGI  using  a  similar  protocol  to  provide  further  coverage  of  the 
phage  genomes.  Inserts  were  sequenced  by  the  DOE  JGI  from  all  of  these  clone  libraries  and 
used  for  initial  assembly  of  these  phage  genomes.  The  Stanford  Finishing  Group  closed  the 
genomes  using  primer  walking. 

Gene  identification  and  characterization 

The  genomes  were  examined  to  identify  open  reading  frames  (ORFs)  using  GeneMark 
(Besemer,  Lomsadze,  and  Borodovsky,  2001).  Translated  ORFs  were  analyzed  for  homology  to 
known  proteins  in  the  non-redundant  GenBank  database  and  in  the  KEGG  database  by  using  the 
BLASTp  program.  Closest  homologues  were  also  compared  to  the  query  sequences  to  examine 
for  the  presence  of  domains  and  length  similarity  of  the  genes.  Translated  ORFs  were  also 
analyzed  for  signal  sequences  and  transmembrane  regions  using  the  web-based  software  SignalP 
and  TMHMM  respectively  (available  at  the  CBS  prediction  servers 

http://www.cbs.dtu.dk/services/).  In  some  cases,  ORF  annotation  was  also  aided  by  the  gene  size, 
domain  conservation  and/or  synteny  (gene  order),  the  latter  as  suggested  for  highly  divergent 
genes  encountered  during  phage  genome  annotation  (Brussow  and  Hendrix,  2002). 
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RESULTS  AND  DISCUSSION 

General  characteristics 

While  the  Prochlorococcus  P-SSP7  podovirus  genome  is  not  to  finished  quality  at  time 
of  this  writing,  draft  sequence  data  suggests  this  genome  is  ~43  kb,  38.7%GC  and  contains  53 
ORFs,  terminal  repeats  and  encodes  its  own  RNA  polymerase  gene  (RNAP),  DNA  polymerase 
gene  (DNAP)  and  an  integrase  gene  (for  comparison  to  other  Podoviridae  see  Table  1).  The  P- 
SSM2  genome  is  252,401  bp,  35.5%GC  and  contains  327  ORFs,  while  the  P-SSM4  genome  is 
178,249  bp,  36.7%GC  and  contains  198  ORFs  (for  comparison  to  other  Myoviridae  see  Table  2). 
These  myoviruses  do  not  contain  an  integrase  gene,  and  the  fact  that  both  of  these  genomes 
assembled  and  closed  suggests  their  DNA  is  circularly  permuted.  These  characteristics  suggest 
that,  generally  speaking,  P-SSP7  is  a  T7-like  phage  (though  unique  in  containing  an  integrase 
gene)  within  the  Podoviridae ,  while  both  Myoviridae  P-SSM2  and  P-SSM4  are  T4-like  phages. 
We  test  this  hypothesis  by  looking  in  more  detail  at  the  complement  of  genes  found  within  each 
of  these  genomes. 

Core  genes  of  T7 -like  phages 

Comparative  genomic  analyses  of  T7-like  phages  suggests  that  they  are  characterized  by 
a  core  of  24  genes  that  are  widespread  among  them  (Table  3)  and  are  essential  for  T7  lytic  growth 
(Molineux,  in  press).  We  have  identified  16  of  these  genes  in  P-SSP7  (Table  3)  and,  strikingly, 
their  gene  order  is  also  conserved  (Fig.  2).  By  analogy  to  T7,  these  16  genes  should  allow  for  the 
majority  of  host  interactions  and  phage  production  including  the  following  (T7-like  gene 
designations  for  genes  involved  are  shown  in  parentheses):  shutdown  of  host  transcription  ( 0 . 7), 
degradation  of  host  DNA  (3,  6),  DNA  replication  (7,  2.5,  4,  5,  host  thioredoxin),  formation  of  a 
channel  across  the  cell  envelope  via  an  extensible  tail  {14,  15,  16),  DNA  packaging  {18,  19),  and 
the  formation  of  the  virion  structure  (8,  9,  10,  11,  72)(Molineux,  in  press).  Notable  genes  that  are 
absent  will  be  discussed  below. 

What  defines  a  T7-like  phage?  The  lytic  T7-like  phages  have  long  been  characterized  by 
a  Podoviridae  morphology  as  well  as  a  genome  size  of  approximately  40  kb  and  the  presence  of 
terminal  repeat  sequences  at  the  ends  of  the  genome  and  a  rifampicin  resistant  RNA  polymerase 
(RNAP)  gene  (Hausmann.  1988).  However,  cases  exist  where  the  T7-like  designation  is  unclear. 
For  example,  both  vibriophage  VpV262  and  roseophage  SIOl  have  approximately  40  kb 
genomes  containing  terminal  repeats  and  DNA  replication  genes  with  significant  homology  to 
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T7-like  phages,  but  lack  an  RNAP  (Hardies  et  al.,  2003;  Rohwer  et  ah,  2000).  In  contrast, 
Xanthomonas- infecting  phage  XplO  contains  a  rifampicin  RNAP  but  has  lambda  morphology 
(Yuzenkova  et  ah,  2003).  Are  these  T7-like  phages?  A  recent  review  chapter  suggests  that  as  the 
known  diversity  of  T7-like  phage  increases,  there  are  three  increasingly  distinct  genome  types 
emerging  within  the  T7  group  (Molineux,  in  press).  The  first  type  is  exhibited  by  coliphages  T3 
and  T7,  yersiniaphages  <t>Al  122  and  <{>Ye03-12,  and  Pseudomonas  putida  phage  gh-I,  which 
exhibit  a  highly  conserved  genome  organization  that  differs  primarily  in  the  presence  or  absence 
of  several  non-essential  genes.  The  second  T7-genome  type  includes  coliphage  Kl-5,  the 
salmonellaphage  SP6,  the  Pseudomonas  aeruginosa  phage  <|>KMV,  the  Xanthomonas  phage 
XplO,  and  the  marine  cyanophage  P60,  which  have  more  dynamic  genomes  and  different  genome 
architecture  from  other  T7  phages,  but  do  contain  a  small  subset  of  genes  homologous  to  other 
T7-like  phages,  including  an  RNAP.  Finally,  the  third  T7-genome  type  contains  genes 
homologous  to  the  DNA  replication  machinery  of  T7-like  phages,  but  has  limited  similarity  of 
either  overall  genome  organization  or  homology  to  any  other  T7-like  genes  and  lacks  an  RNAP. 
This  type  is  represented  by  several  distantly  related  T7  phages,  including  two  marine  phage 
genomes  from  roseophage  SIOl  and  the  vibriophage  VpV262.  Based  upon  the  conserved  overall 
genome  organization  and  significant  similarity  of  genes  between  the  Prochlorococcus  P-SSP7 
phage  genome  and  other  T7-like  phage  genomes  (Table  3,  Fig.  2),  we  suggest  this  phage 
represents  a  T7-like  phage. 

Core  genes  of  T4-like  phages 

Comparative  genomics  has  suggested  that  T4-like  phages  contain  a  core  of  genes  that  are 
required  during  phage  infection.  Among  the  six  T4-like  phage  genomes  currently  available,  a 
comparison  of  the  presence  and  absence  of  various  genes  suggests  a  core  of  18  genes  involved  in 
DNA  replication,  recombination  and  repair,  7  genes  involved  in  transcriptional  and  translational 
regulation,  10  genes  involved  in  nucleotide  metabolism  and  34  genes  involved  in  the  virion 
structure  (Table  4).  We  identified  many  of  these  genes  in  P-SSM2  and  P-SSM4  and  also  found 
that  as  is  the  case  of  other  T4-like  phage  genomes,  these  genes  are  often  grouped  into  functional 
clusters  within  the  genomes  (Figs.  3,  4). 

What  defines  a  T4-like  phage?  The  T4-like  phages  are  differentiated  from  other 
Myoviridae  by  genome  size,  host  type,  presence  or  absence  of  an  integrase  gene  and  the 
conformation  of  the  DNA  found  in  the  particle  (Table  2)(van  Regenmortel  et  al.,  2000).  The 
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marine  vibriophage  K.VP40  is  a  Myoviridae  phage  that  is  most  similar  to  T4-like  phages  (Miller 
et  al.,  2003a).  Vibriophage  KVP40  shows  extensive  regions  of  genome  sequence  and 
organization  were  conserved  with  phage  T4,  perhaps  defining  ‘core’  genes,  while  65%  of  the 
ORFs  were  unique  to  KVP40  with  no  known  function  and  while  missing  genes  involved  in  host 
DNA  degradation  and  early  and  middle  transcription  (Miller  et  al.,  2003a).  Such  core  genes  are 
suggested  to  be  common  to  other  T4-like  phages  (Desplats  and  Krisch,  2003),  but  these  analyses 
are  constrained  to  genomes  from  phages  that  infect  gamma  and  delta  proteobacteria  including  this 
marine  vibriophage.  Based  upon  the  high  number  of  genes  homologous  to  T4-like  phage  genes 
(Table  4),  the  lack  of  genes  homologous  to  other  known  phage  genes,  and  the  distributed  synteny 
of  these  genes  into  T4-like  functional  groups  (Figs.  3,4),  we  suggest  that  these  Prochlorococcus 
myoviruses  are  T4-like  phages. 

It  is  striking  that  the  genomes  from  cyanobacterial  phages  contain  a  significant  portion  of 
the  core  genes  found  in  the  well-studied  T7-like  (podovirus  P-SSP7)  and  T4-like  (myoviruses  P- 
SSM2  and  P-SSM4)  phages  that  infect  only  proteobacteria  (alpha,  delta,  gamma)  and  in  one  case 
cyanobacteria  (Tables  3, 4).  For  nearly  60  years  (Demerec  and  Fano,  1945),  these  coliphages 
have  been  model  systems  for  understanding  the  molecular  underpinnings  of  phage  infection.  T7 
and  T4  are  optimized  for  lytic  infection  of  fast-growing  E.  coli  hosts  (which  have  20  minute 
doubling  times)  that  commonly  dominate  niches  characterized  by  episodic  feast-famine  nutrient 
conditions  and  both  anaerobic  and  aerobic  conditions.  Within  minutes  of  either  phage  attaching 
to  their  host  cell  surface,  host  metabolism  is  compromised  and  diverted  towards  the  transcription 
and  translation  of  the  phage  genome  for  making  phage  particles  (Kruger  and  Schroeder,  1981; 
Kutter,  Guttman,  and  Carlson,  1994).  In  contrast,  the  marine  cyanobacteria  Prochlorococcus  are 
slow  growing  (24-hour  doubling  time),  oxygenic  phototrophs  that  thrive  in  the  nutrient-poor, 
aerobic  surface  waters  of  the  tropical  and  sub-tropical  oceans  (Partensky,  Hess,  and  Vaulot, 

1999).  We  hypothesize  that  these  fundamentally  different  host  generation  times  and 
environmental  conditions  act  as  selective  agents  upon  the  sorts  of  genes  carried  by  phages  for 
infection  of  their  respective  hosts. 

Photosvnthetic  genes  in  cvanophage 

All  three  Prochlorococcus  phage  genomes  contain  core  photosynthetic  genes  previously 
reviewed  in  Chapter  4  (Lindell  et  al.,  submitted).  Because  infection  of  £.  coli  by  T4  and  T7 
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phages  results  in  shutdown  of  host  gene  transcription  (Kruger  and  Schroeder,  1981;  Miller  et  al., 
2003b)  and  turnover  of  core  photosynthetic  proteins  (e.g.,  Dl)  is  rapid,  if  photosynthesis  is  to 
continue,  it  would  be  important  for  a  phage  to  insure  production  of  such  critical  proteins  during 
infection.  Thus,  such  phage-encoded  photosynthetic  genes  are  hypothesized  to  be  important  for 
maintaining  an  active  PSII  reaction  center  to  allow  continued  photosynthesis  during  phage 
infection  (Lindell  et  al.,  submitted;  Mann  et  al.,  2003;  Millard  et  al.,  submitted). 

In  addition  to  these  core  photosynthetic  genes  found  in  each  Prochlorococcus  phage 
genome,  the  myoviruses  contain  additional  photosynthesis-related  genes  -  some  of  which  are 
found  in  both  phages  and  some  of  which  are  found  in  only  one  of  the  two  myoviruses.  The  P- 
SSM2  genome  also  contains  two  genes  (hoi, pebA)  that  encode  proteins  likely  involved  in 
biosynthesis  of  the  accessory  photosynthetic  pigment,  phycoerythrobilin,  from  a  heme  group 
(Frankenberg  et  al.,  2001)  and  a  nifU-like  gene  that  is  thought  to  be  important  for  biosynthesis  of 
iron-sulfur  clusters  in  cyanobacteria  (e.g.,  hemes)  (Nishio  and  Nakai,  2000).  The  myovirus  P- 
SSM4  genome  also  contains  a  gene  (pcyA)  that  encodes  a  protein  involved  in  the  biosynthesis  of 
the  accessory  photosynthetic  pigments,  phycocyanobilin  (Frankenberg  and  Lagarias,  2003)  as 
well  as  a  gene  ( speD )  which  encodes  a  protein  involved  in  the  synthesis  of  polyamines.  Many  of 
these  genes  (exception  =  speD)  occur  in  the  marine  unicellular  cyanobacteria  sequenced  to  date. 
Thus,  it  is  unclear  why  the  production  of  such  accessory  pigments  may  be  useful  to  a  phage 
during  infection.  One  possibility  might  be  that  such  pigments  could  further  mitigate  incoming 
photon  fluxes  to  the  PSII  reaction  center,  again  providing  some  level  of  protection  for  PSII  to 
allow  photosynthesis  to  continue  throughout  phage  infection.  SpeD  is  not  found  in  marine 
unicellular  cyanobacteria,  but  its  usefulness  in  generating  polyamines,  which  are  known  in  higher 
plants  to  affect  the  structure  and  oxygen  evolution  rate  of  the  PSII  reaction  center  (Bograh  et  al., 
1997)  suggests  that  perhaps  the  speD  gene  product  (polyamines)  is  also  useful  for  maintenance  of 
the  PSII  reaction  center,  again  allowing  photosynthesis  to  continue. 

Photosynthetic  organisms  only  fix  carbon  through  photosynthesis  during  daylight  hours. 
The  transaldolase  protein  (encoded  by  the  tal  gene)  is  a  key  enzyme  in  the  oxidative  pentose 
phosphate  pathway  that  is  not  active  in  cyanobacteria  during  daylight  hours.  Thus,  in 
cyanobacteria  this  pathway  is  utilized  for  maintenance  of  energy  during  dark  metabolism  and 
converting  the  fixed  carbon  products  of  photosynthesis  to  sugars/nucleotides/amino  acids  via  a 
ribulose-5-phosphate  intermediate  (Schmetter.  1994).  It  has  been  previously  hypothesized  that  a 
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phage-encoded  tal  gene  may  allow  access  to  stored  carbon  pools  during  non-photosynthetic 
periods  for  continued  phage  production  (Millard  et  al.,  submitted). 

Genes  involved  in  the  specialization  of  podovirus  P-SSP7  to  its  host  metabolism 

In  addition  to  core  T7  genes  and  photosynthesis-related  genes,  the  podovirus  P-SSP7 
genome  also  contains  an  int  gene,  which  encodes  a  site-specific  recombinase  protein  (Table  3). 
The  int  gene  contains  conserved  amino  acid  motifs  previously  identified  for  site-specific 
recombinases,  which  suggests  a  functional  role  (Nunes-Duby  et  al.,  1998)(Appendix  B).  The 
presence  of  a  putative  site-specific  recombinase  in  the  podovirus  P-SSP7  genome  is  consistent 
with  a  previous  hypothesis  (Chapter  2)  that  the  integrating  phase  (as  a  prophage)  of  the  temperate 
phage  life  cycle  may  be  selected  for  in  the  oligotrophic,  open  oceans  where  these  isolates  were 
obtained  (Sullivan,  Waterbury,  and  Chisholm,  2003).  However,  the  genomes  of  currently 
available  freshwater  cyanobacterial  genomes  (Canchaya  et  al.,  2003;  Casjens,  2003)  and  marine 
cyanobacterial  genomes  lack  intact  prophage  (Dufresne  et  al.,  2003;  Palenik  et  al.,  2003;  Rocap  et 
al.,  2003)  and  despite  repeated  attempts,  no  temperate  phage  has  been  induced  from  marine 
cyanobacterial  cultures  (Waterbury  and  Valois,  1993).  Indirect  measures  from  induction 
experiments  in  the  field  are  suggestive  that  temperate  cyanophage  can  be  induced  from 
Synechococcus  (McDaniel  et  al.,  2002;  Ortmann,  Lawrence,  and  Suttle,  2002),  but  no  integrated 
prophage  has  been  demonstrated  and  temperate  cyanophage  are  not  yet  in  culture.  If  this 
integrase  gene  in  podovirus  P-SSP7  is  demonstrated  to  be  functional  and  allows  this  phage  to 
integrate  into  the  host  genome,  then  this  would  be  the  first  example  of  a  temperate  T7-like  phage 
and  of  a  temperate  marine  cyanophage  in  culture. 

The  Prochlorococcus  P-SSP7  genome  also  contains  a  putative  gene  for  ribonucleotide 
reductase,  which  catalyzes  the  thioredoxin  mediated  reduction  of  diphosphates  (e.g.,  ADP,  GDP, 
TDP,  CDP).  Thus  far,  ribonucleotide  reductase  genes  among  T7-like  podovirus  genomes  have 
only  been  observed  in  those  from  marine  environments  (Table  3).  The  roseophage  SIOl  and 
cyanophage  P60  phage  genomes  each  contain  1  and  2  copies,  respectively,  of  ribonucleotide 
reductases  (Chen  and  Lu,  2002;  Rohwer  et  al.,  2000).  Those  found  in  cyanophage  P60  are  most 
similar  to  cyanobacterial  ribonucleotide  reducases  (Chen  and  Lu,  2002)  which  are  thought  to  be 
vitamin  B  12-dependent  (Gleason  and  Olszewski,  2002).  In  both  of  these  phages  and  the 
Prochlorococcus  phage  P-SSP7,  the  location  of  the  putative  ribonucleotide  reductase  gene  is 
within  a  region  homologous  to  the  class  II  gene  region  of  coliphage  T7  that  is  involved  in 
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nucleotide  metabolism  (Molineux,  in  press).  The  presence  and  location  of  putative  ribonucleotide 
reductase  genes  in  all  three  marine  T7-like  podovirus  genomes  sequenced  to  date  suggests  that 
the  incorporation  of  nucleotides  from  degraded  host  DNA  that  has  been  well  characterized  for 
many  phage  host-systems  (Calendar,  1988)  including  Roseophage  SIOl  (Wikner  et  al.,  1993)  is 
particularly  important  for  efficient  nutrient  utilization  during  T7-like  phage  infection  of  nutrient- 
deprived  marine  hosts. 

Host  metabolic  genes  present  in  both  mvoviruses  -  required  for  mvovirus  infection  of 

ProchlowcoccusJ 

In  addition  to  core  T4-like  phage  genes  and  the  photosynthesis-related  genes,  both 
myovirus  genomes  contain  genes  encoding  proteins  that  are  induced  during  phosphate  stress 
( pstS ,  phoH)  (Kim  et  al.,  1993;  Scanlan  et  al.,  1997;  Torriani,  1990),  involved  in  carbon 
metabolism  ( tal )  (Sprenger,  1995),  the  biosynthesis  of  cobalamin  ( cobS)  (Lawrence  and  Roth, 
1995;  Maggio-Hall  and  Escalante-Semerena,  1999)  and  lipopolysaccharides,  as  well  as  the 
modification  of  phage  DNA  (Miller  et  al.,  2003b). 

Phosphate  is  a  scarce  resource  in  the  oligotrophic  oceans  (Karl,  1999;  Wu  et  al.,  2000) 
where  Prochlorococcus  are  abundant.  Interestingly,  both  Prochlorococcus  myoviruses  contain  a 
phoH  gene,  which  in  E.  coli  is  induced  under  phosphate  stress  and  thought  to  encode  an  ATPase 
(Kim  et  al.,  1993).  The  phoH  gene  family  is  widely  distributed  among  prokaryotes  (Kazakov  et 
al.,  2003)  and  has  been  identified  in  two  other  marine  phage  genomes  (Miller  et  al.,  2003a; 
Rohwer  et  al.,  2000).  The  presence  of  the  phoH  gene  in  four  marine  phages  suggests  it  has  an 
important  role  under  conditions  where  phosphate  may  be  limiting.  However,  because  the  role  of 
PhoH  is  unknown,  we  can  only  speculate  that  it  is  likely  involved  in  mobilizing  phosphate, 
perhaps  through  an  anabolic  phospholipids  metabolism  (Kazakov  et  al.,  2003). 

Both  myovirus  genomes  also  contain  the  gene,  pstS,  which  encodes  a  periplasmic 
phosphate  binding  protein  involved  in  phosphate  uptake  (Wanner,  1996).  In  marine 
Synechococcus  endogenous  pstS  is  induced  under  limiting  phosphate  conditions  (<50  nM 
inorganic  phosphate)  (Scanlan  et  al.,  1997).  Hence,  expression  of  a  phage-encoded  pstS  gene 
might  provide  greater  access  to  phosphate  pools  during  infection  of  phosphate-stressed  cells. 
Although  multiple  Pho-Boxes  (transcriptional  promoter  sequences  bound  by  the  protein 
regulators  of  the  pho-regulon)  were  identified  in  roseophage  SIOl,  none  were  identified  in  either 
Prochlorococcus  myovirus  genome  using  the  following  consensus  sequence 
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(CTGTCATA[AT]A[AT]CTGT[CA]A[CT]>  (as  suggested  by  (Wanner,  1996).  This  may  suggest 
that  the  regulation  machinery  of  the  phosphate-inducible  genes  in  cyanobacteria  may  recognize  a 
different  Pho-Box  than  in  other  bacteria.  Alternatively,  it  might  suggest  that  during  phage 
production  phosphate  is  always  limiting,  thus  the  phages  might  not  want  to  regulate  expression  of 
a  gene  that  encodes  a  phosphate  uptake  protein. 

Finally,  the  presence  of  mazG  in  both  Prochlorococcus  myoviruses  further  suggests  that 
manipulation  of  nucleotides  is  important  to  infection  by  myoviruses  of  marine  hosts.  This  gene  is 
widely  distributed  among  prokaryotes  (Zhang  and  Inouye,  2002;  Zhang,  Zhang,  and  Inouye, 

2003)  and  is  a  member  of  a  newly  described  gene  family  encoding  nucleoside  triphosphate 
pyrophosphohydrolase  /  pyrophosphatases  which  catalyze  the  conversion  of  GTP  to  and  from 
GMP. 

Cobalt  is  also  found  in  extremely  low  concentrations  (pM)  in  seawater  (Saito  and 
Moffett.  2002)  and  is  required  for  growth  of  Prochlorococcus  (Saito  et  al.,  2002).  Both 
Prochlorococcus  myoviruses  contain  the  gene,  cobS,  which  in  bacteria  encodes  a  protein  that 
catalyzes  the  final  step  in  cobalamin  (vitamin  B12)  biosynthesis.  The  marine  cyanobacterial 
genomes  contain  an  intact  cobalamin  pathway  including  cobS  (E.  Webb,  pers.  comm.),  which 
supports  the  anecdotal  evidence  that  they  can  make  their  own  vitamin  B 12  as  it  is  not  required  in 
their  growth  medium.  Both  myoviruses  encode  ribonucleotide  reductases  which  are  most  similar 
to  those  found  in  the  marine  cyanobacteria,  which  in  the  cyanobacteria  are  thought  to  require 
cobalamin  (B12)  as  a  cofactor  (Gleason  and  Olszewski,  2002).  Thus,  the  production  of 
cobalamin  during  Prochlorococcus  phage  infection  could  be  important  for  nucleotide  metabolism 
provided  by  the  activity  of  ribonucleotide  reductase.  Alternatively,  the  overexpression  of  the 
cobS  gene  has  recently  been  shown  to  induce  the  phage  shock  protein  PspA,  known  to  destabilize 
membranes  in  Salmonella  (Escalante-Semerena,  pers.  comm.).  Could  expression  of  phage- 
encoded  cobS  also  provide  a  novel  mechanism  for  cell  lysis  upon  completion  of  lytic  phage 
production? 

Both  Prochlorococcus  myovirus  genomes  contain  putative  N6-adenine  methyl 
transferases  and  N4-cytosine  methyltransferases  (Table  4).  Such  enzymes  are  used  in  T4  to 
modify  the  DNA  to  prevent  degradation  of  phage  DNA  by  host  restriction  enzymes  upon 
infection  and  can  allow  specific  recognition  of  phage  DNA  by  host  RNA  polymerase  during  early 
transcription  (Miller  et  al.,  2003b).  Thus,  it  is  likely  that  both  Prochlorococcus  myovirus 
genomes  contain  modified  bases  for  similar  reasons.  Notably,  these  modification  enzymes  are 
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sporadically  distributed  among  the  T4-like  phages  (Table  4)  and,  in  particular,  are  absent  from 
vibriophage  KVP40  where  it  is  unknown  how  the  phage  DNA  is  differentiated  from  host  DNA  in 
early  transcription  (Miller  et  al.,  2003a). 

Taken  together,  the  presence  of  these  many  metabolic  genes  in  both  Prochlorococcus 
myovirus  genomes  suggests  numerous  links  between  phage  and  host  metabolism,  which  may  be 
critical  to  successful  phage  production. 

Metabolic  genes  present  in  one  but  not  both  mvoviruses  -  genes  transiently  passing  through  or 

necessary  eenes  for  specialized  phage-host  interactions? 

There  are  many  genes  in  these  Prochlorococcus  myovirus  genomes  that  are  found  only  in 
one  of  the  two  and  are  typically  absent  from  any  other  T4-like  phages.  The  P-SSM2  genome 
contains  significant  number  of  genes  unique  to  this  myovirus  (Table  4).  The  presence  of  five 
genes  involved  in  purine  (purH ,  purL,  purM,  purN)  and  pyrimidine  (pyrE)  biosynthesis  suggests 
that  this  phage  infects  hosts  where  the  de  novo  synthesis  of  nucleotides  are  limiting  to  phage 
production.  Indeed,  this  is  one  of  the  largest  T4-like  phage  genomes  sequenced  to  date  (252kb) 
and  it  infects  hosts  that  have  small  (- 1.7MB)  genomes  (as  opposed  to  the  large  255kb 
vibriophage  KVP40  genome  phage  that  infects  the  large  -3-4MB  Vibrio  host  genomes)  and  live 
in  highly  oligotrophic  conditions. 

Further,  three  phage-encoded  clusters  of  a  total  of  35  LPS  genes  (e.g.,  epimerases, 
transferases,  phospholipases)  were  found  in  this  phage  genome.  Phage-encoded  LPS  genes  have 
previously  been  limited  to  temperate  phages  that  use  these  genes  upon  infection  and 
establishment  of  the  prophage  state  to  alter  the  cell-surface  composition  of  the  host  (serotype 
conversion)  to  prevent  superinfecting  phage  from  attaching  (thus  conferring  homoimmunity) 
(Calendar,  1988).  While  it  is  unlikely  that  this  T4-like  Prochlorococcus  myovirus  is  temperate 
(lack  of  an  integrase  gene,  no  known  temperate  T4-like  phage),  there  is  reason  to  believe  that  this 
LPS  gene  cluster  could  still  be  functional  and  important  to  the  phage.  For  example,  the 
Prochlorococcus  myovirus  is  likely  to  have  a  significantly  long  lytic  cycle  (extrapolating  from 
Synechococcus  phage  lytic  cycle  lengths)  (Suttle  and  Chan,  1993)  relative  to  other  T4-like  lytic 
phages,  thus  it  is  plausible  that  the  phage  LPS  genes  might  be  expressed  early  during  infection  to 
alter  the  cell  surface  to  provide  similar  exclusion  of  super-infecting  phages.  The  presence  of 
some  phospholipases  also  suggests  the  possibility  that  LPS  is  degraded  during  infection  for 
metabolic  nutrients  for  phage  production. 
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Finally,  this  phage  genome  contains  a  gene  ( pmA )  that  encodes  a  tryptophan  halogenase 
which  catalyzes  the  NADH  consuming  first  of  four  steps  of  converting  tryptophan  to  the 
antibiotic  pyrrolnitrin  (Hammer  et  al.,  1999;  Hammer  et  al.,  1997;  Kimer  et  al.,  1998).  The  fact 
that  this  gene  is  full-length  suggests  that  it  might  encode  a  functional  protein.  However,  the  other 
steps  in  pyrrolnitrin  biosynthesis  require  enzymes  produced  by  the  pmBCD  gene  cluster 
(Hammer  et  al.,  1999),  but  none  of  these  genes,  including  pmA .  are  found  in  marine 
cyanobacterial  genomes.  If  we  assume  that  the  intermediate  product  produced  by  the  PmA 
catalyzed  reaction  is  of  no  use  to  the  phage,  then  the  lack  of  these  other  genes  suggests  that  the 
pmA  gene  encoded  by  the  phage  is  not  being  used  during  infection  of  marine  cyanobacterial  hosts 
and  was  likely  acquired  directly  from  a  non-cyanobacterial  host  or  through  the  fluid  phage 
genome  pool  (Hendrix  et  al.,  1999). 

The  other  Prochlorococcus  myovirus  P-SSM4  genome  also  contains  some  unique  genes, 
although  these  have  perhaps  less  obvious  roles  in  facilitating  phage  production.  Two  proteins 
with  signaling  domains,  carboxylesterase  and  a  cAMP  class  II  phosphodiesterase,  are  encoded  by 
this  genome.  Based  upon  their  function  in  other  organisms,  it  is  likely  they  play  a  role  in  the 
modification  of  carbon  compounds  and  nucleotide  metabolism,  both  of  which  are  important  for 
phage  production  during  infection.  It  is  also  noteworthy  that  this  phage  genome  contains  a  group 
of  some  genes  with  little  homology  to  known  bacterial  proteins,  but  with  homology  to  eukaryotic 
prion-like  proteins  (e-5),  an  archaeal  protease  (e-6)  and  a  hypothetical  protein  from  a  eukaryotic 
slime  mold  (e-4)  all  grouped  near  each  other  (Fig.  4).  Similar  observations  of  eukaryotic  and 
prion-like  genes  have  been  made  in  the  genomes  of  mycobacteriophages  (Pedulla  et  al.,  2003). 
The  presence  of  these  genes  in  a  Prochlorococcus  myovirus  where  they  are  unlikely  to  have  a 
role  suggests  support  for  lateral  transfer  of  genes  to  and  from  a  common  phage  genome  pool 
(Hendrix  et  al.,  1999)  -  in  this  case  gene  transfer  has  managed  to  cross  the  domains  of  life. 
Alternatively,  such  genes  might  be  misidentified  due  to  lack  of  homology  to  genes  with  known 
function  and  may  actually  be  important  to  particular  phage-host  interactions  in  their  prokaryotic 
hosts. 

Finally,  one  ORF  in  this  phage  has  homology  (e-33)  to  a  possible  hemagglutinin 
neuraminidase  in  Prochlorococcus.  Such  genes  in  the  viral  world  have  previously  been  limited  to 
the  ssRNA  viruses  commonly  infecting  humans.  In  these  ssRNA  viruses,  hemagglutinin 
neuraminidase  allows  phage  to  attach  to  cellular  receptors  by  cleaving  sialic  acid  from 
glycoproteins.  Glycoproteins  can  be  formed  from  polysaccharides  and  proteins  are  commonly 
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important  as  cellular  surface  receptors  residing  on  the  external  surface  of  the  membrane  where 
they  are  in  contact  with  the  environment  (Madigan,  Martinko,  and  Parker,  2003).  Such  a  gene 
could  be  playing  a  similar  role  allowing  attachment  to  a  receptor  for  this  Prochlorococcus  phage. 

The  sporadic  distribution  of  these  genes  suggests  either  that  we  are  observing  the 
transient  passage  of  these  genes  through  the  phage  genomic  pool  or  that  these  genes  are 
functional  and  being  maintained  by  selection.  If  they  are  transiently  passing  through  the  genome, 
one  might  expect  the  gene  to  be  degraded  over  time  due  to  neutral  mutation.  However,  if  genes 
do  not  appear  to  be  degraded,  as  is  the  case  for  the  genes  presented  here,  one  might  assume  either 
that  the  genes  are  either  relatively  recent  acquisitions  from  hosts  where  the  gene  was  functional, 
or  must  be  important  to  this  particular  phage  and  functional  during  some  stage  of  the  phage  life 
cycle. 

‘Missing’  genes 

The  fact  that  some  genes  required  for  T7  and  T4  phage  life  styles  are  absent  in  our 
Prochlorocococcus  phage  genomes  suggests  two  distinct  possibilities.  First,  these  genes  may 
have  significantly  diverged  from  known  phage  genes  thus  preventing  detection  through  homology 
searches.  It  is  plausible  that  many  of  these  missing  genes  are  present  in  our  phages  but  more 
difficult  to  identify  due  to  the  limited  sampling  of  cyanophage  genomes  to  date  (Chen  and  Lu, 
2002).  If  these  genes  are  present,  they  may  have  lost  sequence  homology  through  significant 
sequence  divergence  of  an  ancient  homologue  within  this  under-sampled  phage  lineage  or  due  to 
the  replacement  of  a  gene  with  a  functional  equivalent  from  a  foreign  source  (termed  non- 
orthologous  displacement)  (Forterre,  1999).  Alternatively,  these  genes  may  not  be  required  for 
infection  of  Prochlorococcus  hosts  and  may  be  degraded  by  neutral  mutations  without  selection 
for  sequence  conservation. 

The  podovirus  P-SSP7  genome  is  missing  a  suite  of  genes  found  in  other  T7 -like  phages 
(Table  3).  The  protein  products  of  these  genes  are  likely  host-specific  as  they  are  involved  in 
protecting  phage  DNA  against  host  restriction  enzymes  (gene  0.3),  specific  interactions  between 
the  phage  and  host  cell  wall  during  formation  of  the  virion  structure  (e.g.,  genes  6.7,  7.3,  13)  and 
encoding  the  lysis-mediating  amidase  (gene  3.5)  that  is  also  responsible  for  regulation  of  T7 
RNAP  activity  (Molineux,  in  press).  These  genes  have  yet  to  be  identified  in  the  other  marine 
podovirus  genomes,  cyanophage  P60  and  roseophage  SIOl  (Table  3).  Thus,  these  genes  are 
either  significantly  diverged  from  well-characterized  genes  (e.g.,  structural  genes)  or  are  not 
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required  for  infection  of  these  marine  hosts  (e.g.,  anti-restriction  proteins  specific  for  particular 
hosts). 

Analyses  of  both  P-SSM2  and  P-SSM4  myovirus  genomes  also  reveal  some  missing 
genes  whose  loss  may  be  a  consequence  of  the  specific  marine  host’s  lifestyle.  First,  both 
Prochlorococcus  myoviruses  lack  the  genes  required  for  anaerobic  nucleotide  synthesis  pathways 
{nrdD,  nrdG,  nrdH)  that  are  commonly  found  in  T4-like  phages  (including  the  marine 
vibriophage  KVP40).  However,  while  Prochlorococcus  have  been  observed  in  oxygen  minimum 
zones  in  the  Arabian  Sea  (Johnson  et  al.,  1999),  it  is  unlikely  that  Prochlorococcus  cells  in  the 
surface  oceans  of  the  Sargasso  Sea  are  exposed  to  anaerobic  conditions,  thus  there  is  no  need  for 
anaerobic  nucleotide  synthesis  genes  in  the  phage  genomes.  Second,  the  lack  of  tRNA  genes 
encoded  by  the  Prochlorococcus  myoviruses  (there  is  only  1  tRNA  gene  between  them.  Table  4) 
is  striking  because  the  other  T4-like  phages  encode  an  average  of  -16  tRNA  genes  per  phage 
genome  (range:  5-32,  Table  4).  In  T4,  eight  tRNA  genes  supplement  host  tRNA  species  that  are 
present  in  minor  amounts  but  that  recognize  commonly  occurring  phage  codons  (Kunisawa, 
1992).  In  contrast,  vibriophage  KVP40  which  encodes  30  tRNAs  has  similar  codon  usage  to  its 
host.  Vibrio  cholerae ,  so  it  was  suggested  that  the  presence  of  so  many  tRNAs  might  reflect  a 
broader  host  range  by  KVP40  (Miller  et  al.,  2003a).  Such  a  lack  of  tRNA  genes  in 
Prochlorococcus  phages  suggests  that  the  codon  usage  of  these  phage  genomes  more  closely 
matches  that  of  their  range  of  host  genomes. 

Finally,  it  should  be  noted  that  there  are  many  virion  structural  genes  that  have  yet  to  be 
identified  in  these  Prochlorococcus  myovirus  genomes.  It  is  clear  from  electron  micrographs 
(Fig.  1  B.  C)  that  the  genes  encoding  functionally  homologous  proteins  to  those  found  in  T4-like 
phages  must  be  encoded  in  these  genomes.  In  fact,  there  are  many  candidate  ORFs,  based  upon 
size  and  location  in  the  genome,  that  may  eventually  be  identified  as  encoding  the  proteins  for 
these  structures.  It  is  likely  that,  just  as  has  been  found  for  the  other  marine  T4-like  phage 
(vibriophage  KVP40)(Miller  et  al.,  2003a),  many  of  these  structural  genes  are  significantly 
diverged  and  thus  standard  homology  based  searches  will  not  identify  these  genes.  More 
sophisticated  approaches  targeting  regions  of  homology,  coupled  with  laboratory  work  will 
greatly  aid  in  this  process. 
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Evolutionary  fit  enzymes? 

The  few  ‘host’  genes  that  are  found  in  these  Prochlorococcus  phages  are  predominately 
genes  from  the  cyanobacteria  that  are  also  found  in  the  host  genomes  as  part  of  much  larger 
pathways.  If  we  assume  that  some  (or  all)  of  these  phage-encoded  genes  are  functional,  then  one 
must  wonder  why  only  particular  genes  in  a  much  larger  metabolic  pathway  are  of  value.  In  the 
oligotrophic  environment  where  Prochlorococcus  and  their  phage  are  abundant,  it  is  likely  that 
metabolic  pathways  which  determine  the  use  of  key  nutrients  such  as  nitrogen,  phosphorus  and 
carbon  are  strictly  controlled  (Scanlan  and  West,  2002).  It  is  plausible  that  one  way  for  the  host 
cells  to  control  such  pathways  might  be  through  modulating  the  activity  of  key  pathway  enzymes 
through  transcriptional  regulation  over  small  time  scales.  For  example,  the  transaldolase  gene  is 
not  expressed  in  cyanobacteria  while  photosynthesis  is  occurring  (Schmetter,  1994).  However,  if 
the  phage-encoded  transaldolase  gene  were  resistant  to  regulation,  thus  constituitively  expressed, 
this  could  allow  the  phage  constant  access  to  important  metabolites  during  infection. 
Alternatively,  over  evolutionary  time  scales,  there  could  be  selection  in  the  host  for  “control 
point”  enzymes  whose  activity  was  decreased  so  that  these  enzymes  would  act  as  bottlenecks  in  a 
pathway  (Watt  and  Dean,  2000).  If  there  were  selection  for  such  control  point  enzymes  with 
reduced  activity,  could  phage-encoded  versions  of  these  same  enzymes  offer  a  more  efficient 
form  of  the  enzyme  to  maximize  access  to  key  metabolites  during  infection?  This  remains  an 
open  question. 
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CONCLUSIONS 


Phages  populations  are  genetically  dynamic,  and  must  not  only  maintain  their  ability  to 
infect  their  hosts,  they  must  also  be  able  to  delicately  manipulate  their  hosts  to  enable  optimal 
phage  replication.  The  three  Prochlorococcus  phage  genomes  analyzed  here  provide  an 
extensive  demonstration  of  this  intimate  connection.  First,  all  three  Prochlorococcus  phage 
genomes  representing  two  types  contain  integral  photosynthetic  genes  that  are  likely  required  to 
maintain  photosynthetic  activity  of  the  hosts  during  infection  (Lindell  et  al.,  submitted;  Mann  et 
al.,  2003;  Millard  et  al.,  submitted)  which  is  critical  to  successful  phage  production  (Sherman, 
1976).  Second,  both  myoviruses  contain  genes  involved  in  key  metabolic  pathways  that  suggest 
that  phosphate  stress,  cobalamin  biosynthesis,  carbon-store  mobilization  and  modification  of 
nucleotides  are  important  for  infection  of  Prochlorococcus  by  myoviruses.  Third,  presence  of 
other  metabolic  genes  in  one  but  not  both  of  the  myoviruses  may  indicate  genes  that  are  important 
to  the  specialized  interactions  between  the  particular  phage  and  host.  Fourth,  the  integrase  gene 
in  the  podovirus  offers  an  adaptive  strategy  demonstrated  in  many  other  phage  types,  but  not  in 
the  classically  lytic  T7-like  and  T4-like  phages.  If  the  Prochlorococcus  P-SSP7  integrase  is 
functional,  this  has  significant  evolutionary  and  ecological  implications  for  the  phage.  Finally, 
these  genomes  provide  support  for  the  model  that  phages  evolve  by  gene  exchange  through 
differential  access  to  a  global  phage  gene  pool  and  with  their  hosts.  The  gene  complements  in 
these  genomes  suggest  that  these  phages  obtained  genes  from  this  global  phage  genome  pool  as 
well  as  from  their  host  genomes,  while  maintaining  core  group-specific  genes  of  the  T7-like  and 
T4-like  phages.  Understanding  the  mechanisms  for  this  complex  genetic  shuffling  will  be  a  vital 
next  step  in  our  understanding  of  phage  (and  hosts)  in  the  biosphere,  as  will  be  an  understanding 
of  the  functional  consequences  of  this  gene  diversity.  The  proposed  roles  played  by  many  of  the 
phage  and  host  genes  documented  in  these  phage  genomes  awaits  laboratory  experiments 
designed  to  examine  their  expression  and,  if  expressed,  eventual  protein  localization  /  interactions 
to  determine  whether  these  genes  are  transiently  passing  through  or  play  important  functional 
roles  during  infection. 
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Figure  1:  Electron  micrographs  of  Prochlorococcus  phages:  (A)  podovirus  P-SSP7,  (B)  myovirus 
P-SSM2,  (C)  myovirus  P-SSM4.  Concentrated  phage  stocks  were  stained  with  2%  uranyl 
acetate.  Note  the  distinct  T4-like  baseplate  and  tail  tube,  sheath  and  fibers  in  both  myoviruses. 
Scale  bars  indicate  100  nm.  Photo  credit:  Peter  Weigele. 

Figure  2:  Genome  arrangement  of  the  Prochlorococcus  podovirus  phage  P-SSP7.  ORFs  are 
sequentially  numbered  within  the  boxes  and  gene  names  are  designated  above  the  boxes.  Gene 
designations  are  as  per  T7  nomenclature  for  T7-like  genes  (Molineux,  in  press)  or  as  per 
microbial  gene  designation  for  non-phage  genes.  ORFs  are  all  oriented  the  same  direction  in  this 
genome.  While  the  phage  genome  is  one  molecule  of  DNA,  for  viewing  purposes  the  genome 
representation  is  broken  to  fit  on  a  single  page.  Colors  indicate  the  class  I  (light  green),  II 
(yellow)  and  III  (light  blue)  genes  as  inferred  from  T7  phage.  Note  that  the  many  T7  genes  and 
genome  order  are  conserved. 

Figure  3:  Genome  arrangement  of  the  Prochlorococcus  myovirus  phage  P-SSM2.  Gene  names 
are  designated  above  the  box  representing  the  ORF  where  genes  were  identified.  If  the  ORF  is 
located  above  the  centering  line  then  it  is  on  the  forward  strand  of  DNA,  while  an  ORF  located 
below  the  line  indicates  reverse  strand  DNA.  While  the  phage  genome  is  one  molecule  of  DNA. 
for  viewing  purposes  the  genome  representation  is  broken  to  fit  on  a  single  page.  Colors  indicate 
the  putative  role  for  the  identified  genes  as  inferred  from  T4  phage.  Unknown  genes  =  White, 
'Non-phage'  genes  =  Pink,  Transcription  /  Translation  =  red,  nucleotide  metabolism  =  yellow, 
DNA  replication,  recombination,  repair  =  orange,  head  =  dark  blue,  tail  =  light  blue,  purple  = 
presumed  lipopolysaccharide  gene  clusters,  other  phage  genes  =  green.  Gene  designations  are  as 
per  T4  gene  nomenclature  for  T4-like  genes  (Miller  et  al.,  2003b)  or  as  per  microbial  gene 
designations  for  non-phage  genes.  Note  the  large  gene,  gp7,  is  predicted  to  be  over  7000  amino 
acids  long. 

Figure  4:  Genome  arrangement  of  the  Prochlorococcus  myovirus  phage  P-SSM4.  Color  coding, 
ORF  coding  and  gene  nomenclature  as  in  figure  3. 
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Figure  2 
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Figure  4 
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Table  1:  Genome- wide  characteristics  of  the  Prochlorococcus  cyanophage  P-SSP7  relative  to  the  other 
recognized  phage  groups  within  the  Podoviridae  (van  Regenmortel  et  al.,  2000).  Categories  include:  type 
of  host  infected  by  the  phages,  genome  size  (kb),  number  of  open  reading  frames  (ORFs),  terminal  repeats 
(TRs),  the  presence  of  absence  of  an  RNA  polymerase  gene  (RNAP),  the  presence  or  absence  of  a  putative 
site-specific  integrase  gene  (INT).  Y  indicates  that  the  feature  is  present,  N  indicates  that  the  feature  is 
absent. 


Phage 

Host(s) 

Size  (kb) 

#  ORFs 

TRs  (bp) 

RNAP 

INT 

T7-like 

Gram  negatives 

38-43 

43-56 

Y 

Y 

N 

<D29-like 

Gram  positives 

18-22 

17-35 

N 

N 

N 

P22-like 

Gram  negatives 

38-50 

60-65 

N 

N 

Y 

P-SSP7 

Prochlorococcus 

43 

53 

Y 

Y 

Y 

Table  2:  Genome-wide  characteristics  of  of  the  Prochlorococcus  cyanomyophages  P-SSM2  and  P-SSM4 
relative  to  the  other  recognized  phage  groups  within  the  Myoviridae  (van  Regenmortel  et  al.,  2000). 
Categories  include:  the  type  of  hosts  infected  by  the  phages,  genome  size  (kb),  number  of  open  reading 
frames  (ORFs),  the  presence  or  absence  of  an  integrating  phage  element  in  the  life  cycle  (INT).  Y  indicates 
that  the  feature  is  present,  N  indicates  that  the  feature  is  absent,  Y*  indicates  the  phage  integrates  using  a 
transposase  rather  than  a  site-specific  integrase.  ?  indicates  where  no  representative  phage  genomes  have 
been  completely  sequences,  so  the  presence  or  absence  of  the  character  is  unknown. 


Phage 

Host(s) 

Size  (kb) 

#  ORFs 

INT 

DNA  conformation 

T4-like 

Gram  negatives 

164-255 

252-384 

N 

circularly  permuted 

PI -like 

Gram  negatives 

<50  kb 

40 

? 

circularly  permuted 

P2-like 

Gram  negatives 

30-34 

40-44 

N 

circularly  permuted 

Mu-like 

Gram  negatives 

37 

55 

Y* 

linear 

Spol-like 

Gram  positives 

<50 

40 

? 

linear 

<(>H-like 

Archaea 

58-78 

98-121 

Y 

circularly  permuted 

P-SSM2 

Prochlorococcus 

252 

327 

N 

circularly  permuted 

P-SSM4 

Prochlorococcus 

178 

198 

N 

circularly  permuted 
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Table  3:  ‘Core’  genes  in  T7-like  phages  (modified  from  (Molineux,  in  press).  The  size  (amino  acids)  of 
each  translated  gene  is  presented  for  the  genes  classified  using  gene  numbers  according  to  T7  terminology. 
For  P-SSP7,  the  size  (amino  acids)  of  each  gene  is  followed  by  the  best  T7-like  or  microbe-related  e-value 
in  parentheses,  with  no  e- value  given  for  ORFs  that  were  assigned  using  domain  homology  and  synteny. 
The  T7  supergroup  is  divided  into  T7-like  phages  (T7,  T3,  gh-1,  <|)Ye03-12,  <(>A1 122),  the  distant  T7-like 
phages  (e.g.,  P60)  and  the  very  distant  T7-like  phages  (e.g.,  SIOl)  (Molineux,  in  press).  Representative 
phage(s)  are  included  from  each  grouping  for  comparison.  A  indicates  the  lack  of  a  particular  gene 


using  standard  seaches,  *  indicates  a  split  gene  split,  and  **  indicates  a  putative  frameshift  in  the  gene. 


Gene 

P-SSP7 

(e-value) 

77 

T3 

gh-J 

0YeO3- 

12 

(/A  1122 

P60 

SIOl 

notes 

Class  I 

0.3 

- 

117 

153 

- 

153 

141 

- 

- 

B-DNA  mimic;  anti-type  I  restriction 

0.7 

125  (e-18) 

359 

370 

- 

370 

- 

- 

- 

protein  kinase  to  shut-off  host  transcription 

1 

779  (e-91) 

883 

885 

886 

885 

884 

574 

- 

RNA  Polymerase 

1.1 

- 

42 

46 

- 

47 

44 

- 

- 

conserved;  not  essential 

1.2 

- 

85 

92 

- 

92 

86 

- 

- 

host  dGTPase  inhibitor;  F-exclusion 

1.3 

- 

359 

347 

- 

347 

341 

- 

- 

DNA  Ligase 

Class  II 

1.7 

- 

1% 

164 

- 

157 

- 

- 

- 

full  length  gene  not  conserved;  beneficial  for  growth 

2 

- 

64 

55 

56 

79 

65 

- 

- 

host  RNA  polymerase  inhibitor 

2.5 

190  (e=0.004) 

232 

233 

234 

233 

233 

- 

- 

SSB 

3 

117  (e-19) 

149 

153 

148 

154 

152 

- 

135 

Endonuclease  I.  Holliday  junction  resolvase 

3.5 

- 

151 

152 

147 

152 

152 

- 

- 

Amidase  (lysozyme);  regulates  T7  RNAP  activity 

4A 

521  (e-132) 

566 

567 

563 

567 

567 

531 

523 

primase-helicase;  gp4B  helicase  has  internal  in-frame 
start 

4.3 

- 

70 

70 

- 

71 

71 

- 

- 

conserved,  non-essential 

4.5 

- 

89 

94 

- 

95 

90 

- 

- 

conserved,  non-essential 

5 

589  ** 

704 

705 

710 

705 

705 

587 

581 

DNAP 

5.7 

- 

69 

69 

70 

70 

70 

- 

- 

conserved,  non-essential 

6 

260  (e-44) 

300 

303 

315 

304 

301 

243 

- 

5’ -3’  dsDNA  exonuclease;  RNase  H 

Class  III 

6.5 

- 

84 

81 

81 

82 

85 

- 

- 

conserved,  non-essential 

6.7 

- 

88 

83 

91 

84 

89 

- 

- 

essential  virion  ejected  protein 

7 

- 

133 

107 

- 

- 

134 

- 

- 

non-essential,  not  conserved;  host-range 

7.3 

- 

99 

- 

- 

107 

80 

- 

- 

essential  virion  ejected  protein 

8 

523  (e-171) 

536 

536 

544 

536 

537 

555 

- 

head-tail  connector  protein 

9 

266  (e-50) 

307 

311 

292 

311 

305 

- 

scaffolding  protein 

10A 

376  (e-40) 

345 

348 

348 

348 

345 

221 

- 

major  capsid  protein;  -1  frame-shift  yields  minor  capsid 
protein  gplOB,  F-exclusion 

11 

205  (e-23) 

196 

197 

196 

197 

197 

192  * 

- 

tail  protein 

12 

977  (e-58) 

794 

802 

809 

802 

795 

680 

- 

tail  protein 

13 

- 

138 

137 

145 

139 

139 

- 

- 

essential;  required  for  gp6.7  incorporation  into  virion 

14 

- 

196 

198 

194 

198 

197 

- 

- 

internal  core  protein;  ejected  into  infected  cell 

15 

838 

747 

749 

739 

748 

748 

- 

- 

internal  core  protein;  ejected  into  infected  cell 

16 

1246 

1318 

1319 

1393 

1321 

1319 

- 

- 

internal  core  protein;  ejected  into  infected  cell 

17 

716  (e-10) 

553 

559 

619 

646 

559 

- 

- 

tail  fiber  protein 

17.5 

- 

67 

67 

72 

68 

68 

- 

- 

class  II  holin 

18 

Ill  (e- 11) 

89 

89 

86 

89 

90 

- 

- 

small  terminase  subunit 

18.5 

- 

143 

148 

150 

151 

148 

- 

- 

conserved;  lambda  Rz-Rzl  homologs 

18.7 

- 

83 

83 

- 

85 

84 

- 

- 

conserved;  lambda  Rz-Rzl  homologs 

19 

578  (e-121) 

586 

587 

583 

588 

587 

566  * 

- 

large  terminase  subunit 

19.2 

- 

85 

147 

- 

78 

78 

- 

- 

overlappon,  conserved 

19.3 

- 

57 

57 

- 

43 

58 

- 

- 

overlappon,  conserved 

19.5 

- 

49 

49 

- 

50 

50 

- 

- 

non-essential,  conserved 

Others 

phoH 

- 

- 

- 

- 

- 

- 

- 

385 

phosphate-stress  inducible;  putative  RNA  helicase 

nrd 

469  (e-11) 

- 

- 

- 

- 

- 

406/446 

672 

ribonucleotide  reductase  domain 

hli 

64  (e-11) 

- 

- 

- 

- 

- 

- 

- 

high-light  inducible 

psbA 

360  (e=0) 

- 

- 

- 

- 

- 

- 

- 

gene  encoding  D1  protein  of  PSII  reaction  center 
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frd  -  -  -  -  193  195  189  184  181  172  Dihydrofolate  reductase;  product  used  by  td 

nrdD  -  -  606  605  620  608  611  704  Anaerobic  NTP  reductase 

nrdG  -  -  -  -  156  156  167  -  158  159  Anaerobic  NTP  reductase 

td  212  e-84  211  e-78  286  238  410  279  300  111  Thymidy late  synthase 

urdH _ - _  102  90  89  91  79  94  Anaerobic  glutaredoxin 
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CHAPTER  VI.  SUMMARY  AND  FUTURE  DIRECTIONS 


SUMMARY 

This  thesis  describes  the  first  isolation  of  Prochlorococcus  cyanophage  and  the 
establishment  of  a  cyanophage  culture  collection,  isolated  using  a  phylogenetically  diverse 
selection  of  Prochlorococcus  and  Synechococcus  hosts  (Sullivan,  Waterbury,  and  Chisholm, 
2003)(Chapter  2).  This  phage  collection  contains  all  three  phage  morphological  families 
previously  described  for  Synechococcus  (Lu,  Chen,  and  Hodson,  2001;  Suttle  and  Chan,  1993; 
Waterbury  and  Valois,  1993;  Wilson  et  al.,  1993),  and  the  breadth  of  the  collection  revealed  a 
relationship  between  host  of  isolation  and  the  resulting  phage  morphology.  Specifically,  HL 
Prochlorococcus  strains  yielded  exclusively  Podoviridae  while  LL  Prochlorococcus  and 
Synechococcus  yielded  primarily  Myoviridae.  Host-range  analyses  of  21  host  strains  and  45 
cyanophage  isolates  demonstrated  varying  levels  of  specificity  between  phage  morphological 
types;  Podoviridae  commonly  infected  a  single  strain,  while  Myoviridae  commonly  infected 
across  ecotypes  and  even  across  the  two  cyanobacterial  genera  (Sullivan,  Waterbury,  and 
Chisholm,  2003).  While  such  trends  are  difficult  to  interpret,  they  may  reflect  variable  host 
properties,  such  as  differences  in  cell  surface  characteristics  and/or  variability  in  anti-phage 
defense  mechanisms  (e.g.,  R-M  systems,  blocking  proteins),  as  well  as  differences  in  phage 
properties  such  as  tail  fiber  switching  and/or  copy  number  (see  below).  For  example,  host  range 
may  be  controlled  by  the  distribution  among  host  ecotypes  of  cell  surface  molecules  that  act  as 
phage  receptors  used  to  gain  entry  to  their  hosts.  Using  this  logic,  one  possible  explanation  for 
the  observed  host  range  patterns  is  that  that  Podoviridae  use  the  highly  variable  o-antigen  of  the 
lipopolysaccharide  (LPS)  as  a  receptor,  while  Myoviridae  use  the  less  variable  core 
oligosaccharides  of  the  LPS  or  outer  membrane  proteins  (OMPs),  thus  allowing  them  to  infect  a 
broader  range  of  hosts.  Alternatively,  though  not  mutually  exclusive,  the  tail  fibers  of  the  phage 
that  determine  the  host  receptor  a  phage  binds  (Henning  and  Hashemolhosseini,  1994)  may  not 
recognize  different  receptor  types.  Instead,  host  range  might  be  determined  by  swapping  the 
specificity  determinant  from  one  phage  to  another  (termed  tail  fiber  switching)  (Tetart,  Desplats, 
and  Krisch,  1998;  Tetart  et  al.,  1996)  or  by  a  given  phage  genome  encoding  multiple  tail  fiber 
genes  (Miller  et  al.,  2003).  The  mechanisms  of  host  specificity  remain  one  of  the  most  pressing 
unknowns  in  marine  phage-host  ecology. 

Next,  we  focused  on  one  group  of  viruses  in  the  collection,  the  Myoviridae,  to  compare 
the  captured  phage  diversity  in  our  collection  with  that  found  in  other  studies  focusing  on  this 


family.  Diversity  within  the  Myoviridae  has  been  previously  characterized  using  the  g20  gene 
that  encodes  the  portal  protein  (Sullivan  et  al.)(Chapter  3).  g20  sequences  from  our  phage 
collection  grouped  with  those  of  previously  cultured  isolates,  but  not  with  environmental  g20 
sequences  which  lacked  g20  sequences  from  cultured  representatives.  This  suggests  that  the  g20 
sequence  diversity  in  all  current  myophage  isolates,  including  those  of  our  collection,  lack  some 
of  the  g20  diversity  observed  in  the  environment.  In  addition,  methodological  complications  of 
our  work  indicated  that  the  previous  studies  of  environmental  g20  diversity  may  also  have  missed 
myophage  groups.  g20  PCR  screening  of  our  collection  was  hindered  by  the  fact  that  the 
published  g20  primers  could  not  amplify  sequences  from  all  myophage  in  the  collection. 
Therefore,  we  redesigned  the  existing  primers  to  provide  a  new  primer  set  that  could  successfully 
amplify  g20  sequence  from  all  myophage  in  the  collection.  Once  able  to  adequately  map  the 
myophage  g20  diversity  of  the  isolates,  we  found  that  this  index  of  diversity  showed  little 
relationship  to  various  phage  properties,  such  as  host  range  and  geographic  isolation.  Together 
this  work  suggests,  not  only  have  previous  environmental  g20  diversity  studies  likely  missed  g20 
diversity  from  many  myophage,  but  also  suggests  that  the  g20  gene  may  not  be  the  most 
appropriate  marker  for  tracking  phages  specific  for  particular  hosts  (e.g.,  in  studies  of  coevolution 
of  phage  and  host).  Future  research  might  focus  on  using  the  tial  fiber  genes  as  a  more 
ecologically  informative  diversity  proxy  for  this  phage  family,  since  tail  fiber  genes  are  known  to 
be  the  host  range  determinant  in  these  phages  (Henning  and  Hashemolhosseini,  1994). 

To  examine  the  abundance  and  diversity  of  cyanophage  in  the  field  using  culture-based 
assays,  a  broad  selection  of  host  strains  were  used  to  quantify  lytic  cyanophage  titers  along  a 
coastal -to-open  ocean  transect.  These  assays  demonstrated  that  cyanophage  titers  were  lower  in 
open  ocean  environments  than  coastal  environments  (Sullivan,  Waterbury,  and  Chisholm, 
2003)(Chapter  2).  Because  of  the  complexity  of  the  phage-host  interaction,  defmitve 
explanations  for  such  trends  remain  elusive.  However,  there  are  some  factors  which  could  be 
involved.  For  example,  if  host  strain  microdiversity  increased  along  the  transect,  and  cross¬ 
infection  ability  did  not  increase  concurrently,  this  would  lead  to  lower  phage  titers  (Thingstad. 
2000).  Another  possible  explanation  is  decreased  nutrient  availability  in  the  oligotrophic  oceans 
(Cavender-Bares,  Karl,  and  Chisholm,  2001)  resulting  in  sub-optimal  growth  of  host  cells  in  the 
Sargasso  Sea  (Mann  et  al.,  2002)  relative  to  Synechococcus  at  the  coastal  site  (Waterbury  et  al., 
1986).  Viral  production  is  correlated  with  host  growth  rates  in  chemostats  (Bohannan  and 
Lenski,  2000)  and  in  the  field  (Steward,  Smith,  and  Azam,  1996).  Such  a  correlation  might  result 
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from  nutrient  limitation  that  causes  physiological  changes  in  the  host  that  stall  the  lytic  process  of 
obligately  lytic  phage  (Stent,  1963),  or  favours  lysogeny  in  temperate  phage  (Rohwer  et  al„  2000: 
Suttle,  2000;  Wilson,  Carr,  and  Mann,  1996).  Because  we  currently  lack  the  ability  to 
specifically  quantify  phages  that  infect  particular  hosts,  it  is  not  yet  possible  to  clarify  the 
mechanistic  underpinnings  explaining  such  observations. 

To  further  explore  the  potential  interactions  between  phage  and  host  during  cyanophage 
infection  as  inferred  from  genome  sequences,  we  analyzed  genome  sequences  provided  through 
collaborations  with  the  Department  of  Energy  Joint  Genome  Institute  of  three  hosts  (Palenik  et 
al.,  2003;  Rocap  et  al.,  2003)(Appendix  B)  and  three  Prochlorococcus  cyanophages  (Chapter  4, 
5).  Analysis  of  three  cyanobacterial  genomes  ( Prochlorococcus  MED4,  MIT9313  and 
Synechococcus  WH8102)  revealed  that,  in  spite  of  the  fact  that  prophage  are  common  to 
prokaryotic  hosts  (Canchaya  et  al.,  2003;  Casjens,  2003)  and  that  field  data  are  suggestive  that 
prophage  are  inducible  from  field  Synechococcus  populations  (McDaniel  et  al.,  2002;  Ortmann, 
Lawrence,  and  Suttle,  2002),  these  three  genomes  do  not  contain  intact  prophage.  However,  it  is 
likely  that  such  prophage,  if  they  existed,  may  have  been  induced  from  our  cultured  host  cells  due 
to  the  stressful  conditions  encountered  during  10+  years  of  culturing.  While  we  cannot  rule  out 
the  possibility  that  the  opportunity  for  reintegration  of  these  prophages  exists,  it  is  also  plausible 
that  induced  phages  could  have  become  non-infective  through  adsorption  to  non-productive  hosts 
or  host  particles  in  cell  cultures  left  in  prolonged  stationary  phase.  Two  lines  of  evidence  from 
genomic  sequencing  suggest  that  prophage  may  have  once  existed  in  these  genomes.  First,  the 
occurrence  of  many  intact  and  fragmented  site-specific  integrase  genes  found  in  all  three 
genomes  (Appendix  B)  strongly  suggests  that  prophage  may  have  once  existed  in  these  genomes 
(Palenik  et  al.,  2003).  Further,  the  podovirus  P-SSP7  contains  an  integrase  gene  (Chapter  5) 
which  suggests  that,  if  this  integrase  gene  is  functional,  then  temperate  phage  do  occur  among  the 
marine  cyanobacteria. 

Analyses  of  the  three  Prochlorococcus  cyanophage  genomes  (Chapter  4,  5)  suggest  these 
phages  are  related  to  well-characterized  groups,  the  T7-like  (for  phage  P-SSP7)  and  T4-like  (for 
phages  P-SSM2,  P-SSM4)  phages,  but  are  optimized  for  infection  of  photosynthetic  hosts,  and 
specifically  those  which  thrive  in  oligotrophic  environments  (Sullivan  and  al.).  All  three  of  these 
genomes  contain  genes  that  encode  core  photosynthetic  proteins  that  are  full-length,  conserved 
and  clustered  in  the  genome  (Lindell  et  al.,  submitted)( Chapter  4).  We  hypothesize  that  such 
conservation  of  sequence  results  from  selection,  and  thus  that  these  genes  are  expressed  and 


functional  during  infection  and  offer  an  advantage  to  the  phage  and/or  host  that  has  fixed  these 
genes  in  the  phage  genomes.  In  addition,  both  Myoviridae  P-SSM2  and  P-SSM4  genomes 
contain  conserved,  full-length  proteins  involved  in  phosphate  stress,  the  mobilization  of  carbon 
storesand  the  synthesis  of  cobalamin,  purines,  LPS  and  several  accessory  pigments.  Together, 
these  genomic  observations  suggest  the  unique  ways  that  cyanophage  may  alter,  enhance  and  co¬ 
opt  the  metabolic  capabilities  of  their  hosts.  In  addition,  the  integrase  gene  found  in  the 
Podoviridae  P-SSP7  genome  suggests  this  phage  is  capable  of  integrating  into  its  host;  if  further 
experiments  provide  direct  evidence  for  thes,  then  it  will  represent  the  first  temperate  T7-like 
phage  and  the  first  temperate  marine  cyanophage  in  culture. 

FUTURE  DIRECTIONS 

In  order  to  more  broadly  understand  the  ecology  of  cyanophage  and  their  hosts,  it  will  be 
important  to  pinpoint  the  mechanisms  controlling  their  interactions.  In  fact,  the  need  for 
mechanistic  understanding  of  microscale  interactions  is  important  for  developing  our 
understanding  of  all  microbial  processes  fundamental  to  the  global  biogeochemistry  of  the  oceans 
(Azam  and  Worden,  2004).  As  mechanistic  information  is  then  incorporated  into  models  of 
phage-host  population  dynamics,  and  model  outputs  are  compared  to  real-world  data,  such  as  host 
range  information  and  transect  titers  presented  in  this  thesis,  the  validity  of  the  model  will  be 
tested,  the  parameters  refined  and  remaining  holes  in  our  understanding  will  be  highlighted.  In 
this  way,  genomics,  genetics,  biochemistry,  field  studies  and  computer  modeling  will  all  play 
important  roles  in  furthering  our  understanding  of  marine  cyanophage  ecology. 

Efficiency  of  cross-infection 

The  host  range  analysis  conducted  in  this  thesis  offers  a  preliminary  look  at  trends  in 
phage-host  cross  infection  (Chapter  2).  However,  more  information  about  each  individual  host- 
phage  interaction  is  necessary  before  adequate  modeling  efforts  can  be  made.  I  have  provided 
binary  data  (either  a  phage  infects  or  does  not  infect  a  host  strain  of  interest)  for  the  entire  matrix 
of  the  host  range  table  (Chapter  2).  To  adequately  model  this  system,  we  will  need  more  details 
about  the  interactions  that  are  occurring,  including  quantifying  the  efficiency  of  infection  among 
different  phage-host  pairings.  Do  the  broad  host-ranges  of  the  Myoviridae  come  at  the  expense  of 
rapid  and  efficient  infection  of  any  single  strain  relative  to  other  Myoviridae ?  Do  the  more 
specific  host-ranges  of  the  Podoviridae  correspond  to  a  higher  infection  efficiency?  Thus 


revisiting  the  host  range  table  and  repeating  the  observations,  this  time  recording  the  time  from 
infection  to  lysis,  will  be  important  for  refining  our  understanding  of  phage-host  dynamics.Such 
an  experiment  can  now  be  done  in  a  high-throughput  manner  using  the  96  well  plate  format  and 
the  fluorescence  plate  reader.  In  this  way,  every  phage-host  combination  can  be  tested  to  allow 
quantitative  documentation  of  the  timing,  and  extent,  of  host  cell  lysis.  Clearly,  controlled 
experiments  using  phage  and  host  at  similar  concentrations  is  critical  to  accurate  comparisons. 
Together,  these  data  would  better  describe  the  the  efficiency  with  which  a  given  cyanophage 
infects  a  suite  of  host  strains. 

Physiological  characterization  of  phage  isolates 

Only  one  marine  cyanophage  has  been  physiologically  characterized  to  obtain  important 
parameters  necessary  for  modeling  the  significance  of  phage-host  interactions  (Suttle  and  Chan, 
1993).  Although  it  has  been  shown  that  the  majority  of  phage  that  infect  Synechococcus  are  of 
the  family  Myoviridae  (Suttle  and  Chan,  1993;  Suttle  and  Chan,  1994;  Waterbury  and  Valois, 
1993),  the  only  cyanophage  that  has  been  characterized,  the  cyanophage  S-BBS1,  belongs  to  the 
family  Siphoviridae ,  and  is  no  longer  in  culture  (Suttle  and  Chan,  1993).  The  average  adsorption 
rate  constant  over  a  60  minute  incubation  was  0.035  min'1  which  is  slow  relative  to  those 
typically  seen  for  other  bacteriophage,  but  within  the  range  reported  for  cyanophage  infecting 
freshwater  cyanobacteria  (Amla,  1981;  Cseke  and  Farkas,  1979;  Samimi  and  Drews,  1978).  A 
one-step  growth  curve  experiment  indicated  that  cell  lysis  occurred  between  9  and  17  hours 
following  infection,  while  the  burst  size  was  estimated  to  be  250  phage  per  infected  cell  (Suttle 
and  Chan,  1993).  The  burst  size  is  similar  to  those  previously  reported  for  Siphoviridae  that 
infect  freshwater  cyanobacteria  (Martin,  Leach,  and  Kuo,  1978),  but  is  considerably  larger  than 
burst  sizes  estimated  for  the  Myoviridae  (Safferman  et  al.,  1972;  Sherman  and  Connelley,  1976), 
the  phage  morphology  most  commonly  infecting  Synechococcus  (Suttle  and  Chan,  1993;  Suttle 
and  Chan,  1994;  Waterbury  and  Valois,  1993). 

Such  values  are  important  for  understanding  and  modeling  phage-host  interactions  in  the 
environment.  For  example,  the  length  of  the  lytic  cycle  and  the  burst  size  produced  during  a 
phage  life  cycle  are  critical  parameters  for  calculating  mortality  and  phage  production.  What  are 
the  values  of  these  parameters  for  the  phages  in  the  MIT  cyanophage  collection?  How  might  they 
differ  between  HL  and  LL  Prochlorococcus  hosts?  Between  Podoviridae,  Myoviridae  and 
Siphoviridael  These  types  of  experiments  are  not  trivial  in  a  system  where  the  host  cells  double 
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at  best  once  per  day,  but  these  data  would  provide  valuable  information  about  adsorbtion,  the 
length  of  the  latent  period  and  burst  size.  Some  work  on  this  front  is  already  underway  in  the 
Suttle  lab  with  Synechococcus  cyanophage  (C.A.Suttle,  pers.  comm.)  and  it  would  be  beneficial 
to  confer  with  their  group  before  beginning  these  studies  with  the  MIT  collection. 

Towards  a  tool  for  genetic  manipulations 

The  vast  number  of  phages  and  phage  lysates  in  our  collection  might  include  a  phage(s) 
that  could  transduce  (move)  genes  between  their  hosts.  Such  transducing  phages  are  a  vital  tool 
for  genetically  altering  host  strains  to  produce  mutants  suitable  for  important  hypothesis-testing, 
allowing  a  critical  genetic  investigation  not  only  of  many  host  properties  but  also  of  the  cellular 
components  potentially  mediating  phage-host  interactions.  One  way  to  identify  such  transducing 
phage  from  the  collection  is  to  screen  them  for  the  ability  to  transfer  a  selectable  marker  gene 
between  host  strains.  However,  this  approach  requires  the  creation  or  identification  of  a 
selectable  marker  gene  in  the  host.  Although  there  have  not  been  selectable  markers  defined  for 
Prochlorococcus  previously,  a  streptomycin  resistant  mutant  (Str-R)  has  recently  been  created  in 
the  MED4  strain  (E.  Zinser,  unpubl.  data)  which  will  hopefully  provide  an  appropriate  selectable 
marker.  While  the  mechanism  of  resistance  is  not  yet  determined  in  Prochlorococcus,  in  E.  coli 
streptomycin  resistance  occurs  spontaneously  at  a  frequency  of  about  10'9  to  10'10  per  generation 
and  is  often  due  to  a  single  point  mutation  in  the  ribosomal  protein  small  subunit,  rpsL  (E.  Zinser, 
pers.  comm.). 

Given  a  Str-R  marker,  a  screen  for  transducing  phages  could  be  conducted  as  follows. 
The  mutant  Str-R  strains  would  be  infected  with  a  phage  isolate,  the  phage  produced  from  this 
infection  would  be  harvested  and  used  to  infect  a  streptomycin-susceptible  strain  which  could 
then  be  subjected  to  selection  using  streptomycin.  In  known  transducing  phage,  the  series  of 
molecular  steps  involved  in  the  transfer  of  a  gene  has  been  described.  It  begins  with  the 
packaging  of  a  portion  of  the  first  host’s  DNA,  encompassing  the  selectable  marker  gene,  into  a 
phage  capsid  during  assembly  (Calendar,  1988;  Stent,  1963).  This  mispackaged  DNA  can  result 
in  a  phage  particle  that  is  still  capable  of  infection  but  not  of  lysis  (i.e.,  the  required  phage  DNA 
has  been  replaced  with  host  DNA).  If  this  phage  successfully  infects  a  host  cells,  then  the 
introduced  DNA  has  an  opportunity  to  undergo  genetic  exchage  with  the  new  host  genome  via 
homologous  recombination.  Thus  in  our  proposed  assay,  to  screen  for  such  recombination 
events,  the  remaining  cells  that  did  not  lyse  would  be  tested  for  growth  against  streptomycin 
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media  to  show  Str-R.  PCR,  cloning  and  sequencing  could  be  used  to  confirm  the  sequence  of  the 
genetic  marker  in  the  putative  streptomycin  resistant  strain  was  due  to  the  integration  of  the  donor 
DNA  and  not  a  spontaneous  mutation.  In  parallel,  it  is  critical  to  quantify  the  rate  of  transduction 
to  be  certain  that  this  rate  was  greater  than  the  rate  at  which  spontaneous  mutation  to 
streptomycin  resistance  occurs  in  cells  infected  with  the  same  phage  isolate  using  aliquots  that 
have  not  passed  through  the  Str-R  host  first.  Calculation  of  the  transduction  rate  is  also  crucial 
for  planning  experiments  for  subsequent  genetic  work.  I  note  that,  in  the  case  of  a  temperate 
phage,  it  is  possible  that  packaging  of  a  small  portion  of  the  host  genome  can  occur  (Calendar, 
1988).  Because  the  mechanism  of  packaging  host  DNA  in  a  temperate  phage  is  commonly  by 
imprecise  excision  of  the  prophage  genome  from  the  host  genome,  this  mispackaged  host  DNA 
would  be  limited  to  DNA  that  is  adjacent  to  the  insertion  site  of  the  phage.  Thus,  unless  the 
temperate  phage  inserted  randomly  into  the  host  genome  (as  is  the  case  for  phage  Mu),  it  is 
improbable  that  such  an  insertion  site  might  be  near  the  gene  causing  streptomycin  resistance. 
However,  to  be  certain  that  streptomycin  resistance  in  the  recipient  strain  were  due  to 
transduction  rather  than  prophage  insertion,  one  might  confirm  that  no  phage  DNA  were 
integrated  into  the  host  genome.  This  might  be  best  achieved  through  a  southern  hybridization 
with  phage  genomic  DNA  against  DNA  extracted  from  host  cells  that  were  washed  to  remove 
adsorbed  phages. 

Towards  a  direct  measure  for  quantifying  viruses  specific  for  a  particular  host 

In  spite  of  the  fact  that  viruses  are  known  to  be  abundant,  dynamic  members  of  the 
microbial  community  (Bergh,  1989;  Bratbak,  1990;  Proctor  and  Fuhrman,  1990),  it  has  remained 
difficult  to  enumerate  the  phage  that  are  specific  to  a  particular  host.  Currently  used  methods 
either  indiscriminately  stain  viruses  without  any  indication  for  the  host  strain  infected  by  the  virus 
(e.g.,  SYBR,  Yo-Pro)  (Hennes  and  Suttle,  1995;  Noble  and  Fuhrman,  1998)  or  underestimate  the 
number  of  viruses  because  they  are  limited  to  organisms  amenable  to  culture  and  to  phage  that 
can  be  propogated  under  laboratory  conditions  (e.g.,  MPN,  plaque).  Hence  there  is  a  need  to 
develop  a  culture-independent  approach,  for  example  the  use  of  Q-PCR  to  quantify  phage  of 
particular  types.  This,  however,  still  relies  on  the  identification  and  validation  of  appropriate 
marker  genes  in  the  phage.  We  have  examined  the  sequence  of  the  gene  encoding  the  portal 
protein,  g20.  from  the  Myoviridae  isolates  in  our  diverse  collection  of  cyanophage  (Sullivan, 
Waterbury,  and  Chisholm,  2003)  because  it  is  a  commonly  used  genetic  diversity  marker 
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(Frederickson  and  Suttle,  2001;  Marston  and  Sallee,  2003;  Wilson  et  al.,  1999;  Wilson  et  al., 

2000;  Zhong  et  al.,  2002).  However,  we  found  that  the  diversity  of  the  portal  protein  gene 
sequences  does  not  correlate  to  the  original  host  strain  used  for  isolation  or  to  the  host  range  of 
the  cyanophage  (Chapter  3)(Sullivan  et  al.).  Thus,  a  new  genetic  marker  should  be  developed 
that  can  be  used  in  studies  correlating  phage  gene  sequence  diversity  with  host  ITS  diversity. 

The  sequence  and  structure  of  the  tail  fiber  protein  determines  the  ability  of  a  phage  to 
bind  to  receptor  molecules  on  the  surface  of  a  potential  host  cell  (Henning  and  Hashemolhosseini, 
1994).  Do  cyanophage  tail  fiber  gene  sequences  correlate  with  the  diversity  of  Prochlorococcus 
and  Synechococcus  ITS  gene  sequences?  Because  these  genes  are  commonly  modular  with 
highly  divergent  and  highly  conserved  regions  (Hendrix,  2003),  they  are  suitable  for  the  design  of 
PCR  primers  that  could  be  group-specific.  We  are  uniquely  positioned  to  embark  on  such  a  study 
as  we  have  (1)  a  diversity  of  cyanophage  in  culture,  (2)  experience  with  field  based  Q-PCR 
protocols  and  (3)  three  sequenced  cyanophage  genomes  (a  fourth  is  on  the  way).  The  growing 
number  of  available  phage  genomes  will  be  useful  for  the  design  of  degenerate  PCR  primers  that 
we  could  use  to  screen  our  diverse  cyanophage  collection  for  tail  fiber  amplicons.  In  addition  to 
our  T7-like  podovirus  genome  and  two  T4-like  myovirus  genomes,  there  are  currently  6  T4-like 
genomes  (RB49,  RB69,  Aehl,  44RR2.8t,  T4.  KVP40)  and  5  T7-like  genomes  (T3,  T7,  P60, 

SIOl,  VpV262)  that  can  be  used  in  primer  design  as  well. 

The  Chisholm  Lab  has  already  developed  a  Q-PCR  assay  based  upon  the  ITS  gene 
sequence  to  quantify  the  abundance  of  the  six  known  phylogenetic  lineages  of  Prochlorococcus. 

A  complementary  Q-PCR  assay  based  upon  the  tail  fiber  gene  sequences  could  allow  concurrent 
quantification  of  cyanophage  that  infect  particular  hosts.  The  development  of  such  a  combined 
Q-PCR  assay  that  is  culture-independent  and  can  specifically  quantify  both  the  phage  specific  to  a 
given  host  and  the  host  itself  would  open  the  door  to  experiments  that  require  parallel  tracking  of 
such  difficult-to-assay  populations. 

The  use  of  such  a  tool  would  have  many  applications,  including  the  following.  Field 
work  suggests  that  overall  population  mortality  caused  by  cyanophage  is  relatively  low  for 
Synechococcus  (Mann,  2003;  Suttle  and  Chan,  1994;  Waterbury  and  Valois,  1993),  while 
cyanophage  titers  from  this  thesis  suggest  that  total  cyanobacterial  population  mortality  due  to 
lytic  phages  might  be  even  lower  in  the  open  oceans  (Sullivan,  Waterbury,  and  Chisholm,  2003). 
However,  the  cyanobacteria  are  incredibly  diverse  and  the  cyanophage  infecting  them  have  a 
variety  of  host  ranges  which  complicates  simple  interpretations  and  modeling  efforts  from 
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quantification  of  general,  perhaps  biologically  uninformative,  groupings  of  hosts  and  phages. 
Because  we  can  now  quantify  ecotypes  within  Prochlorococcus  and  Synechococcus  populations, 
other  labs  are  trying  to  assess  the  impact  of  spiking  cyanophage  isolates  into  bottled  natural 
cyanobacterial  communities  during  24-48  hour  incubations  (J.  Fuhrman,  pers.  comm.).  We  know 
that  Myoviridae  and  Podoviridae  have  strikingly  different  host  ranges  (Chapter  2),  will  they  have 
correspondingly  distinct  effects  on  natural  cyanobacterial  populations?  If  experiments  were  done 
over  the  course  of  an  annual  cycle,  is  there  a  seasonal  effect  on  the  interactions  of  cyanophage 
with  specific  parts  of  the  cyanobacterial  population?  Is  this  driven  by  seasonal  cycling  of  the  host 
populations? 

Experimental  determination  of  cyanopodophage  promoters 

Due  to  the  extensive  genomic  similarity  between  the  Prochlorococcus  podovirus  P-SSP7 
and  coliphage  T7  (Chapter  5),  we  expect  that  podovirus  P-SSP7  would  have  similarly  regulated 
genomic  transcription.  In  extensive  experiments  done  in  coliphage  T7  (Dunn  and  Studier,  1983; 
Kruger  and  Schroeder,  1981;  Molineux,  in  press),  the  class  I  genes  of  the  phage  (including  the 
phage-encoded  RNAP)  are  transcribed  upon  entering  the  cells  by  the  host  RNAP  and  have  E.  coli 
promoters,  while  the  class  II  and  III  genes  are  transcribed  using  a  phage-encoded  RNA 
polymerase  (RNAP)  gene  that  recognize  promoters  specific  to  this  polymerase.  If  these  general 
attributes  of  T7  promoters  hold  true  in  our  T7-like  cyanophage,  then  we  would  expect  class  I 
genes  to  have  cyanobacterial  promoters,  while  class  II  and  class  III  genes  would  have  promoter 
consensus  sequences  specific  for  the  cyanophage  RNAP. 

To  test  a  possible  interaction  between  a  phage-encoded  T7-type  RNAP  and  phage-DNA 
promotors  (or  host  RNAP  and  its  associated  promoters  in  the  phage),  one  could  produce  the 
RNAP(s)  as  a  purified  recombinant  enzyme  and  then  perform  biochemical  assays  in  vitro  as 
follows.  Briefly,  clone  the  phage  and  host  RNAP  genes  into  separate  E.  coli  constructs,  over¬ 
express  these  genes  in  E.  coli  to  produce  a  large  amount  of  the  respective  protein  products,  purify 
the  proteins  and  use  the  respective  RNAP  proteins  in  a  DNA  binding  assay.  By  fragmenting  the 
phage  DNA,  it  could  be  determined  which  regions  of  the  phage  DNA  are  bound  by  the  respective 
RNAP  genes.  If  the  interaction  between  the  phage  and/or  host  RNAP  and  phage-encoded  DNA 
were  too  weak,  then  it  might  be  better  to  directly  measure  transcript  initiation  using  a  5 ’-RACE 
method  as  used  for  Prochlorococcus  MED4  (Vogel  et  al.,  2003).  T7  RNAP  acts  as  a  single 
protein  not  requiring  any  accessory  proteins  or  regulatory  factors  (Dunn  and  Studier,  1983; 
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Kruger  and  Schroeder,  1981),  so  for  a  T7-type  RNAP  this  should  prove  relatively  easily. 
However,  the  host  RNAP  may  prove  quite  difficult  until  regulation  in  cyanobacteria  is  better 
understood  as  the  host  RNAP  likely  requires  several  subunits  plus  regulatory  factors. 

Expression  analysis  during  infection 

By  examining  host  and  phage  gene  expression  during  infection,  many  questions  raised  in 
this  thesis  (particularly  chapters  4  and  5)  could  be  addressed.  Gene  expression  can  be  analyzed 
either  at  the  mRNA  or  protein  level,  using  a  variety  of  tools.  The  transcription  of  specific  genes 
of  interest  can  be  examined  with  RT-PCR  and  transcript-specific  primers,  while  the  entire 
transcriptome  (genome-wide  transcription)  can  be  examined  through  the  construction  and 
implementation  of  a  microarray.  Using  unfinished  genome  sequence  files  from  the  early  stages 
of  these  phage  genome  projects,  all  ORFs  from  the  genomes  of  two  Prochlorococcus  phage,  P- 
SSM4  and  P-SSP7  are  represented  on  Affymetrix  microarray  chips  that  were  designed  for  the 
Prochlorococcus  host  (D.  Lindell,  A.  Wright,  pers.  comm.).  This  will  allow  simultaneous 
examination  of  genome-wide  gene  expression  in  both  the  host  and  phage  during  infection  and 
over  a  range  of  experimental  conditions.  Protein  analyses  in  a  cell  line  lacking  a  genetic  system 
can  be  done  using  the  techniques  being  developed  by  various  groups  within  the  GTL  consortium 
that  are  using  mass  spectrometry  in  conjunction  with  genome  data  to  determine  which  proteins 
are  produced  at  any  given  time. 

There  are  many  questions  that  beg  hypothesis-testing  as  a  result  of  the  genomic 
observations  from  this  thesis  (Chapters  4  and  5).  Specifically,  the  Affymetrix  microarrays  can  be 
used  to  examine  the  expression  during  host  infection  of  those  P-SSP7  and  P-SSM4  genes  relevant 
to  photosynthesis  (psbA,  hli),  carbon  mobilization  ( tal  gene),  phosphate  metabolism  (pstS,  phoH 
genes)  and  cobalt  usage  ( cobS  gene).  In  P-SSP7  (the  T7-like  virus)  the  expression  of  the 
integrase  gene  could  be  assayed  specifically  using  RT-PCR  or  using  the  Affymetrix  microarrays 
to  better  understand  the  possible  development  of  lysogeny  (as  described  in  more  detail  below). 

The  role  of  lysogeny 

The  fact  that  the  podovirus,  P-SSP7,  is  a  T7-like  phage  that  includes  an  integrase  gene 
(Chapter  5)  begs  the  question  of  whether  this  integrase  gene  is  functional  and  allows  P-SSP7  to 
integrate  into  the  host  genome  as  a  prophage.  To  date,  there  are  no  T7-like  phages  that  contain 
integrase  genes  and  there  are  no  temperate  marine  cyanophage  in  culture.  Experimental  evidence 
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to  show  that  P-SSP7  can  integrate  into  the  host  genome  show  that  this  phage  is  temperate  and 
capable  of  integrating  into  host  DNA  as  a  prophage.  Using  either  RT-PCR  or  microarrays  to 
follow  the  expression  of  the  phage-encoded  integrase  gene  will  be  critical  to  show  whether  and 
when  this  gene  is  expressed  during  infection,  and  how  lysogeny  could  be  affected  by  differing 
experimental  conditions.  If  the  integrase  gene  is  expressed,  it  will  be  important  to  subsequently 
demonstrate  that  the  phage  DNA  is  actually  integrated  into  the  host  genome.  This  could  be  done 
through  southern  hybridization  with  a  phage  genome  probe  against  DNA  extracted  from  washed 
host  cells  (as  above).  Alternatively,  convincing  support  for  prophage  integration  might  be 
achieved  by  the  use  of  a  carefully  designed  PCR  assay  using  the  putative  lysogen  DNA  as  a 
template.  In  this  assay,  one  primer  could  be  specifically  designed  as  an  outward-facing  primer  at 
one  end  of  the  phage  genome  while  a  second  primer  could  be  random  to  obtain  a  variety  of 
products.  Subsequent  sequencing  could  identify  whether  the  product  contained  both  phage  and 
host  DNA.  Though  more  difficult  than  a  southern,  this  assay  would  also  provide  information  on 
where  the  prophage  integration  site  is  within  the  genome. 

While  the  experimental  steps  required  to  confirm  that  the  prophage  is  indeed  integrated 
into  the  host  genome  seems  relatively  manageable,  the  difficult  step  will  likely  be  in  finding  the 
conditions  and/or  host  strain  where  the  phage  actually  integrates  into  the  host  genome.  As  a 
starting  point  for  finding  such  conditions,  temperate  phage  classically  integrate  under  conditions 
where  the  host  cell  is  growing  with  limited  nutrients,  because  it  is  under  these  conditions  that  the 
host  is  unable  to  supply  the  building  blocks  and  energy  required  to  produce  progeny  phage 
particles  (Calendar,  1988). 

Are  ‘host’  genes  prevalent  among  cyanophages  infecting  marine  cyanobacteria? 

The  photosynthetic  genes  psbA  and  hliP  are  found  in  all  three  Prochlorococcus 
cyanophage  genomes  sequenced  to  date  (Lindell  et  al.,  submitted),  but  not  in  all  Synechococcus 
cyanophage  (Mann  et  al.,  2003;  Millard  et  al.,  submitted).  It  would  be  interesting  to  assay  for  the 
presence  of  these  genes  and  others  (e.g.,  psbD,  pstS,  phoH,  talC)  in  the  other  isolates  of  the  MIT 
cyanophage  collection.  Since  PCR  can  amplify  DNA  from  very  few  template  copies  and  because 
these  genes  are  also  found  in  the  hosts,  such  a  study  would  require  appropriate  controls.  Phage 
and  host  DNA  could  be  separated  in  one  of  two  ways:  (1)  phage  particles  can  be  separated  from 
host  DNA  by  density  centrifugation  using  cesium  chloride  gradients,  or  (2)  host  DNA  be 
degraded  with  DNase  while  the  phage  DNA  is  still  protected  within  the  phage  heads.  The  phage 
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particles  (purified  by  either  cesium  chloride  or  DNase)  could  then  be  used  as  template  for  PCR 
reactions  with  primers  designed  to  amplify  the  gene  of  interest.  Prior  efforts  to  amplify,  clone 
and  sequence  the  genes  of  interest  from  each  of  the  putative  host  strains  would  provide  a  library 
of  host  gene  sequences  that  could  help  determine  whether  a  sequenced  PCR  product  from  a  phage 
lysate  was  amplified  from  contaminating  host  DNA  or  packaged  phage  DNA.  Finally,  southern 
blots  using  probes  for  both  a  phage  structural  gene  and  a  potential  “host”  gene  of  interest  could 
confirm  that  the  given  gene  is  indeed  in  the  phage  genome.  Together,  these  two  experiments 
would  build  a  strong  case  that  the  host  gene  of  interest  was  actually  present  in  the  phage  genome. 

The  big  picture  and  big  obstacles 

While  the  abundance  of  phages  in  the  marine  environment  is  well  recognized  (Bergh, 
1989;  Bratbak,  1990;  Proctor  and  Fuhrman,  1990),  their  genetic  and  functional  diversity  is  only 
beginning  to  be  appreciated.  Phages  are  likely  influential  on  host  microbial  processes  as 
reservoirs  of  genetic  information  that  affects  the  evolution  and  physiology  of  their  hosts 
(Lawrence,  Flendrix,  and  Casjens,  2001;  Lindell  et  al.,  submitted).  The  small  size  of  phage 
genomes  means  that  100  phage  genomes  can  be  sequenced  for  the  same  cost  as  a  large  bacterial 
genome  (Paul  et  al.,  2002).  Such  value  for  investment  means  that  phage  genomic  sequencing 
should  yield  genomic  data  that  would  provide  insight  toward  understanding  evolutionary  and 
biogeochemical  processes  of  the  marine  microbiological  food  web. 

However,  there  are  two  significant  barriers  to  be  overcome.  First,  it  is  difficult,  and  in 
some  cases  prohibitive,  to  obtain  concentrated  phage  stocks  to  provide  sufficient  material  for 
genomic  sequencing.  Collected  efforts  with  marine  phage-host  systems  to  optimize  growth 
conditions  that  yield  high  phage  titers  and  improve  purification  methods  should  prove  invaluable 
to  preparation  of  phages  for  sequencing.  Second,  experimental  confirmation  of  genome -enabled 
hypotheses  requires  the  development  of  well  characterized  phage-host  systems  that  can  be 
usefully  probed  in  the  laboratory.  As  evidenced  in  chapters  4  and  5,  bioinformatics  only  gets  us 
so  far.  The  proof  lies  in  experimental  documentation. 

In  addition  to  learning  about  new  functionality  between  phages  and  their  hosts,  it  is 
important  to  quantify  particular  phage  types  in  an  ecologically  meaningful  way.  Currently,  we 
lack  the  ability  to  quantify  phages  that  infect  particular  hosts  without  using  biased  culture-based 
methods.  Thus,  the  development  of  unbiased  culture-independent  assays  are  essential  to 
accurately  assess  phage  roles  in  mortality  and  co-evolution  in  the  environment.  While  a 
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significant  database  of  portal  protein  gene  sequences  from  myoviruses  exists,  it  is  unclear 
whether  this  taxonomic  marker  will  indicate  which  hosts  the  phage  can  infect  (Chapter  3). 
Alternatively,  we  suggest  a  new  taxonomic  marker  using  the  distal  tail  fiber  gene.  The  tail  fiber 
protein  determines  host  specificity  of  the  phage  (Henning  and  Hashemolhosseini,  1994),  thus  it 
seems  a  logical  target  for  the  development  of  such  an  assay.  Due  to  the  modular  and  mosaic 
nature  of  these  genes,  such  work  is  likely  to  be  quite  complicated  and  require  expertise  in  protein 
biology,  bioinformatics  and  molecular  biology.  However,  the  availability  of  many  phage 
genomes  for  the  types  of  phages  most  commonly  found  in  the  marine  environment  (T4-like  and 
T7-like  phages),  should  provide  sufficient  starting  material  for  rigorous  identification  of 
important  conserved  regions  within  these  genes  that  could  be  used  for  targetted  PCR  primers  to 
span  divergent  regions  that  could  provide  host  range  information. 

The  age  of  phage  is  upon  us.  I  look  forward  to  what  we  will  leam. 
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APPENDIX  A 


THE  MIT  CYANOPHAGE  COLLECTION 


Table  of  Established  Clonal  Cyanophage  Isolates  and  Environmental  Lysates 

The  following  pages  describe  the  various  clonal  phage  isolates  and  lysates  in  the  MIT 
cyanophage  collection.  Many  of  these  phage  lysates  are  unpublished,  but  may  prove  valuable 
resources  for  future  studies  of  phage.  TEM  Morph  =  Transmission  Electron  Microscopy 
morphology  ( M=Myoviridae ,  S=Siphoviridae,  P=Podoviridae),  Head  diam.=  diameter  across  the 
head  in  nanometers.  Tail  length  and  width  are  measured  in  nanometers,  #  /  window  =  number  of 
particles  observed  per  window  on  a  200  mesh  copper  carbon  type  B  grid  (Ted  Pella,  Redding  CA 
#01811). 


Virus 

Locale 

Depth  (m) 

Date 

Collected 

Dilution 

TEM  Morph 

Head  diam. 

Tail  Length 

Tail  Width 

#  /  window 

Date 

isolated 

Notes 

9211-1 

BATS 

100 

6-Jun-00 

0.1 

M 

60 

100 

20 

7-Aug-00 

9211-2 

BATS 

100 

6-Jun-00 

0.1 

M 

60 

100 

20 

7-Aug-00 

9211-3 

BATS 

100 

6-Jun-00 

0.01 

M 

? 

60 

15 

15 

7-Aug-00 

probably  a  myo 

9211-4 

BATS 

75 

6-Jun-00 

0.1 

M 

? 

60 

110 

15 

10 

13-Sep-OO 

9211-5 

BATS 

75 

6-Jun-OO 

1 

P? 

40 

5 

1 3-Sep-OO 

9211-6 
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45 

Sep-99 

1 

M 

? 

9211-7 
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100 

Sep-99 

1 

9211-8 
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Sep-99 
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9211-9 
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120 

Sep-99 

0.1 

9211-10 
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70 

Sep-99 

1 

30-Jan-01 

9211-11 
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Sep-99 

0.1 

30- Jan-01 

9211-12 
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1 

30- Jan-01 

9211-13 
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Sep-99 
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30- Jan-01 
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plaque 
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smooth  edge 

9211-17 

Sarg 
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plaque 
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clear,  crisp  edge 

9211-18 

Sarg 
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big,  rough  edge 
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Red  Sea 
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0.01 

P 

40 
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29-Aug-00 
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Red  Sea 
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15-Jul-00 

0.01 

P 

40 

12 
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29-Aug-00 
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Virus 
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Depth  (m) 

Date 

Collected 

Dilution 

TEM  Morph 

Head  diam. 

Tail  Length 

Tail  Width 

#  /  window 

Date 

isolated 

Notes 

9215-5 

- 

- 
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- 

P? 

65 

10 

large  Podos  never 
clear  /  sharp,  but 
common 

9215-6 

Red  Sea 
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S? 

40 
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6 

1 

29-Aug-X 

R2  liquid,  only  podos 
(40nm) 

9215-6 

- 
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- 
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P 
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50 

CLONAL 
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P 

40 
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1 3-Sep-X 
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P 

X 

1 

9215-9 

Red  Sea 
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P 

40 
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Red  Sea 
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1 

P 

40 
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25-Oct-OO 

9215-10 

- 

- 

- 

- 

P 

X 
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Red  Sea 
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P 

40 
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- 

P 
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P 
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P 

40 

50 

15-Aug-X 

9X2-2 

BATS 

100 

6-Jun-X 

1 

P 

40 

20 

1 5-Aug-00 

9302-3 

BATS 

100 

6-Jun-00 

1 

M 

60 

100 

12 

1 5-Aug-X 

2  shots  contractile  tail 

9302-4 

BATS 

100 

6-Jun-X 

1 

P 

40 

20 

1 5-Aug-X 

9302-5 

BATS 

100 

6-Jun-X 

1 

P 

40 

20 

1 5-Aug-X 

9302-6 

Red  Sea 

sfc 

15-Jul-00 

1 

P 

40 

50 

1  -Aug-00 

9302-6 

- 

- 

- 

- 

M 

9 

X 

IX 

12 

1 

9302-7 

Red  Sea 

sfc 

15-Jul-X 

1 

P 

40 

100 

1 -Aug-00 

9302-7 

- 

- 

- 

- 

M 

X 

120 

12 

40 

9302-8 

BATS 

75 

6-Jun-00 

0.1 

P 

40 

200 

8-Sep-00 

9302-8 

- 

- 

- 

- 

M 

? 

60 

none? 

1 

9302-9 

BATS 

75 

6-Jun-X 

1 

NOTHING 

8-Sep-00 

9X2-10 

Red  Sea 

5 

13-Sep-00 

1 

P 

40 

30 

25-Oct-OO 

9302-11 

Red  Sea 

50 

13-Sep-00 

1 

P 

40 

30 

25-Oct-OO 
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9302-12 

Red  Sea 

100 

13-Sep-00 

1 

P 

40 

10 

25-Oct-OO 

9302-13 

BATS 

100 

Sep-99 

0.1 

P 

40 

500 

4- Jan-01 

9302-13 

- 

- 

- 

- 

P 

60 

1 

9302-14 

BATS 

120 

Sep-99 

0.1 

P 

40 

300 

4-Jan-01 

9302-15 

Red  Sea 

5 

13-Sep-00 

0.1 

7-Mar-01 

9302-16 

Red  Sea 

50 

13-Sep-00 

0.01 

7-Mar-01 

9302-17 

Red  Sea 

100 

13-Sep-00 

0.1 

7-Mar-01 

9302-18 

Red  Sea 

130 

13-Sep-00 

1 

7-Mar-01 

9303-1 

BATS 

100 

6-Jun-00 

1 

M 

? 

60 

10 

1 5-Aug-OO 

probably  myos 

9303-2 

BATS 

100 

6-Jun-00 

1 

M 

60 

10 

1 5-Aug-OO 

CLONAL,  HR,  PCR 

9303-3 

Red  Sea 

sfc 

15-Jul-00 

1 

M 

60 

100 

12 

30 

8-Sep-00 

CLONAL,  HR,  PCR 

9303-4 

BATS 

75 

6-Jun-00 

0.1 

NOTHING 

8-Sep-00 

9303-5 

BATS 

75 

6-Jun-00 

0.1 

M 

? 

2 

8-Sep-00 

9303-6 

BATS 

75 

6-Jun-00 

1 

M 

60 

110 

12 

5 

8-Sep-00 

Cannot  find  stock  ??? 

9303-7 

BATS 

100 

6-Jun-00 

1 

M 

60 

110 

12 

2 

8-Sep-00 

Cannot  find  stock  ??? 

9303-8 

Red  Sea 

5 

1 3-Sep-00 

plaque 

M 

10 

0 

75 

25 

11 -May-01 

large  plaques  -3+mm 

9303-9 

Red  Sea 

100 

13-Sep-00 

plaque 

11 -May-01 

large  plaques  ~3+mm 

9303-10 

Red  Sea 

130 

13-Sep-00 

plaque 

M 

75 

75 

25 

11 -May-01 

large  plaques  -3+mm 

9312-1 

BATS 

10 

6-Jun-00 

1 

P 

40 

40 

1 5-Aug-00 

NOT  a  virus  . . . 

9312-1 

P 

60 

25 

5 

9312-2 

BATS 

100 

6-Jun-00 

1 

P 

40 

1 3-Aug-OO 

9312-3 

BATS 

100 

6-Jun-00 

1 

P 

40 

20 

1 3-Aug-OO 

9312-4 

BATS 

100 

6-Jun-00 

1 

P 

40 

1 

1 3-Aug-OO 

9312-5 

BATS 

100 

6-Jun-OO 

1 

P 

40 

1 

1 3-Aug-OO 

9312-6 

BATS 

100 

6-Jun-00 

1 

P 

40 

1 

1 3-Aug-OO 

9312-7 

BATS 

75 

6-Jun-OO 

1 

P 

40 

5 

8-Sep-00 

9312-8 

Red  Sea 

50 

1 3-Sep-00 

1 

P 

40 

25 

25-Oct-OO 

9312-9 

BATS 

120 

Sep-99 

0.1 

P 

40 

50 

4- Jan-01 

CLONAL 

9312-10 

BATS 

100 

Sep-99 

0.1 

P 

55 

50 

4- Jan-01 

CLONAL 

9312-11 

BATS 

70 

Sep-99 

1 

P 

55 

100 

4- Jan-01 

7.7x1 0A7  plaque  assay 
count,  CLONAL 

9312-12 

BATS 

45 

Sep-99 

0.01 

P 

40 

10 

4- Jan-01 

CLONAL 

9312-12 

P 

60 

2 

9312-13 

BATS 

3 

Sep-99 

1 

P 

40 

10 

4- Jan-01 

CLONAL 

9312-14 

BATS 

15 

Sep-99 

0.1 

P 

50 

300 

17-Jan-01 

CLONAL 

9312-15 

Red  Sea 

50 

13-Sep-00 

1 

P 

40 

30 

2-Mar-01 

9313-1 

Shelf 

40 

Sep-01 

plaque 

M 

25-Nov-01 

big,  clear  plaques  with 
uneven  edges,  2x 
plaque  purified 

9313-2 

Shelf 

0 

Sep-01 

plaque 

M 

25-Nov-01 

2xplaque  purified 

9313-3 

Slope 

60 

17-Sep-01 

plaque 

S 

60 

12-Dec -01 

2xplaque  purified 
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9313-4 

Slope 

83 

17-Sep-01 

plaque 

S 

50 

250 

15 

30 

12-Dec-01 

1 00  nm  elongated 
head,  2x  plaque 
purified 

9515-1 

Red  Sea 

sfc 

15-Jul-00 

0.1 

P 

40 

50 

9515-2 

Red  Sea 

sfc 

1 5-Jul-OO 

1 

P 

40 

50 

9515-3 

BATS 

10 

6-Jun-00 

0.1 

P 

40 

5 

9515-4 

BATS 

10 

6-Jun-00 

1 

P 

40 

30 

9515-5 

BATS 

75 

6-Jun-00 

0.01 

P 

40 

50 

9515-6 

BATS 

75 

6-Jun-00 

0.1 

P 

40 

30 

9515-7 

BATS 

75 

6-Jun-00 

1 

P 

40 

30 

9515-8 

BATS 

100 

6-Jun-00 

0.1 

P 

40 

10 

9515-9 

BATS 

100 

6-Jun-00 

1 

P 

40 

10 

9515-10 

BATS 

120 

Sep-99 

0.1 

P 

55 

300 

4-Jan-01 

2x1 0*8  plaque  count 

9515-11 

BATS 

100 

Sep-99 

0.01 

P 

55 

1000 

4- Jan-01 

2.1x10A8  plaque  count 

9515-12 

BATS 

70 

Sep-99 

0.1 

P 

40 

1000 

4- Jan-01 

2.5x10*8  plaque  count 

9515-13 

BATS 

45 

Sep-99 

0.1 

P 

40 

1000 

4- Jan-01 

2.9x10*8  plaque  count 

9515-14 

BATS 

3 

Sep-99 

1 

P 

40 

20 

8- Jan-01 

1 .8x10*8  plaque  count 

9515-14 

- 

- 

- 

- 

P 

60 

5 

9515-15 

Red  Sea 

5 

13-Sep-00 

0.1 

P 

40 

50 

2-Mar-01 

9515-16 

Red  Sea 

50 

13-Sep-OO 

0.1 

P 

50 

? 

100 

2-Mar-01 

9515-17 

Red  Sea 

100 

1 3-Sep-00 

0.1 

P 

50 

9 

100 

2-Mar-01 

9515-18 

Red  Sea 

130 

1 3- Sep-00 

0.1 

P 

50 

? 

100 

2-Mar-01 

SSI  20-1 

BATS 

100 

Sep-99 

LV 

P 

60 

15 

10 

40 

13-Feb-00 

0.2um  filtered 

SSI  20-2 

BATS 

100 

Sep- 99 

LV 

P 

50 

40 

16- Feb-00 

0.2um  filtered 

SSI  20-3 

BATS 

120 

Sep-99 

1 

P 

50 

1000 

SSI  20-4 

BATS 

45 

Sep-99 

1 

P 

50 

1000 

SS120-5 

Red  Sea 

130 

13-Sep-OO 

1 

M 

80 

+ 

30+ 

25 

10 

7-Mar-01 

SS120-6 

Slope 

83 

17-Sep-01 

plaque 

P 

50 

10 

1000 

Os 

29- Dec-01 

CLONAL 

MED4-1 

BATS 

3 

Sep-99 

LV 

P 

40 

few 

15-May-00 

0.2um  filtered 

MED4-2 

BATS 

15 

Sep-99 

LV 

P 

40 

few 

1 5-May-00 

0.2um  filtered 

MED4-3 

BATS 

40 

Sep-99 

LV 

P 

40 

100 

1 5-May-OO 

0.2um  filtered 

MED4-3 

- 

- 

- 

- 

M/ 

S? 

65 

110 

few 

MED4-4 

BATS 

70 

Sep-99 

LV 

P 

40 

300 

15- May-00 

0.2um  filtered 

MED4-5 

BATS 

100 

Sep-99 

LV 

P 

40 

500 

2-Feb-00 

0.2um  filtered, 
CLONAL 

MED4-6 

Gulf 

Stream 

3 

Sep-99 

LV 

P 

40 

? 

1 5-May-00 

0.2um  filtered, 
CLONAL 

MED4-7 

Gulf 

Stream 

40 

Sep-99 

LV 

P 

40 

? 

1 5-May-00 

0.2um  filtered 

MED4-7 

- 

- 

- 

- 

M 

? 

65 

110 

? 

MED4-8 

Gulf 

Stream 

80 

Sep-99 

LV 

P 

45 

50 

2-Feb-00 

0.2um  filtered, 
CLONAL 
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MED4-9 

Core 

3 

Sep-99 

LV 

P 

40 

10 

17-Jul-00 

0.2um  filtered, 

CLONAL 

MED4-10 

Core 

40 

Sep-99 

LV 

P 

40 

50 

1 5-May-00 

0.2um  filtered 

MED4-10 

- 

- 

- 

- 

P 

60 

few 

MED4-11 

Core 

80 

Sep-99 

LV 

P 

40 

100 

1 5-May-00 

0.2um  filtered 

MED4-12 

Core 

100 

Sep-99 

LV 

P 

40 

450 

1 5-May-00 

0.2um  filtered 

MED4-13 

Core 

120 

Sep-99 

LV 

P 

40 

80 

1 5-May-00 

0.2um  filtered 

MED4-13 

- 

- 

- 

- 

M 

? 

65 

110 

few 

MED4-14 

BATS 

10 

6-Jun-00 

1 

P 

40 

1000 

1 5-Aug-00 

MED4-15 

BATS 

100 

6-Jun-00 

1 

P 

60 

20 

1 5-Aug-00 

MED4-16 

Slope 

70 

Sep-99 

LV 

P 

40 

? 

1 4-Mar-00 

0.2um  filtered 

MED4-17 

Red  Sea 

sfc 

15-Jul-00 

0.001 

P 

40 

10 

1 5-Aug-00 

MED4-18 

Red  Sea 

sfc 

1 5-Jul-00 

0.01 

P 

40 

100 

1 5-Aug-00 

MED4-19 

Red  Sea 

sfc 

1 5-Jul-00 

0.01 

P 

50-60? 

few 

1 5-Aug-OO 

MED4-20 

Red  Sea 

sfc 

15-Jul-00 

0.1 

NOTHING 

1 5-Aug-OO 

MED4-21 

BATS 

70 

Sep-99 

0.01 

P 

60 

30 

8-Sep-00 

MED4-22 

BATS 

45 

Sep-99 

0.1 

P 

40 

300 

8-Sep-00 

MED4-23 

BATS 

45 

Sep-99 

0.1 

NOTHING 

8-Sep-00 

MED4-24 

BATS 

70 

Sep-99 

0.1 

P 

40 

2 

13-Sep-00 

MED4-25 

BATS 

70 

Sep-99 

0.1 

P 

60 

4 

13-Sep-00 

MED4-26 

BATS 

100 

Sep-99 

0.01 

P 

40 

50 

13-Sep-00 

MED4-27 

BATS 

100 

Sep-99 

0.1 

P 

40 

5 

1 3-Sep-OO 

MED4-28 

BATS 

100 

Sep-99 

0.1 

P 

40 

5 

1 3-Sep-OO 

MED4-29 

BATS 

75 

6-Jun-00 

0.001 

P 

40 

1 

1 3-Sep-OO 

low  contrast 

MED4-30 

BATS 

75 

6-Jun-00 

0.01 

P 

40 

5 

1 3-Sep-OO 

MED4-31 

BATS 

75 

6-Jun-00 

0.1 

P 

40 

10 

1 3-Sep-OO 

MED4-32 

BATS 

15 

Sep-99 

1 

P 

40 

20 

24-Sep-00 

MED4-33 

BATS 

15 

Sep-99 

1 

P 

40 

5 

24-Sep-00 

MED4-34 

BATS 

3 

Sep-99 

1 

P 

40 

10 

24-Sep-00 

MED4-35 

BATS 

3 

Sep-99 

0.1 

P 

40 

24-Sep-00 

MED4-36 

BATS 

3 

Sep-99 

0.1 

P 

40 

80 

24-Sep-00 

MED4-37 

BATS 

120 

Sep-99 

0.01 

P 

40 

50 

24-Sep-00 

MED4-38 

BATS 

120 

Sep-99 

0.1 

P 

40 

40 

24-Sep-00 

MED4-39 

BATS 

120 

Sep-99 

1 

P 

40 

100 

24-Sep-00 

pos,  neg  stained 

podos 

MED4-40 

Red  Sea 

5 

13-Sep-00 

0.001 

P 

40 

15 

25-Oct-OO 

MED4-41 

Red  Sea 

5 

13-Sep-00 

0.01 

P 

40 

50 

25-Oct-OO 

MED4-42 

Red  Sea 

5 

1 3-Sep-OO 

0.1 

P 

40 

100 

25-Oct-OO 

CLONAL 

MED4-43 

Red  Sea 

5 

1 3-Sep-00 

1 

P 

40 

10 

25-Oct-OO 

MED4-44 

Red  Sea 

50 

13-Sep-00 

0.01 

P 

40 

30 

25-Oct-OO 

CLONAL 

MED4-45 

Red  Sea 

50 

1 3-Sep-00 

1 

P 

40 

50 

25-Oct-OO 

MED4-46 

Red  Sea 

100 

13-Sep-00 

0.01 

P 

40 

10 

25-Oct-OO 

MED4-47 

Red  Sea 

100 

13-Sep-00 

1 

P 

40 

50 

25-Oct-OO 

MED4-48 

Red  Sea 

130 

13-Sep-00 

0.1 

P 

40 

50 

25-Oct-OO 

CLONAL 
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MED4-49 

Red  Sea 

130 

13-Sep-00 

1 

P 

40 

30 

25-Oct-OO 

MED4-50 

BATS 

100 

Sep-99 

0.01 

P 

40 

50 

3-Nov-00 

NIED4-51 

BATS 

100 

Sep-99 

0.1 

P 

40 

300 

3- Nov-00 

MED4-52 

BATS 

100 

Sep-99 

1 

P 

40 

100 

3-Nov-00 

MED4-53 

Red  Sea 

5 

13-Sep-00 

0.01 

P 

55 

30 

2-Mar-01 

no  tails  observed  at  all 

MED4-54 

Red  Sea 

50 

13-Sep-00 

0.001 

P 

55 

100 

2-Mar-01 

no  tails  observed  at  all 

MED4-55 

Red  Sea 

100 

13-Sep-00 

0.1 

P 

60 

30 

2-Mar-01 

no  tails  observed  at  all 

MED4-56 

Red  Sea 

130 

13-Sep-00 

1 

P 

60 

30 

2-Mar-01 

no  tails  observed  at  all 

NATL1A-1 

BATS 

100 

6-Jun-OO 

0.01 

P? 

40 

7-Aug-00 

TEM-lxlO^ 

NATL1A-1 

- 

- 

- 

- 

M 

65 

100 

12 

20 

plaque  CLONAL  (2x) 

NATL1A-2 

BATS 

100 

6-Jun-OO 

0.01 

M 

65 

100 

12 

50 

7-Aug-00 

NATL1A-3 

BATS 

100 

6-Jun-OO 

0.01 

P 

55-60 

7-Aug-00 

NATL1A-4 

BATS 

100 

6-Jun-OO 

0.01 

NOTHING 

7-Aug-00 

NATL1A-5 

Red  Sea 

5 

13-Sep-00 

1 

7-Mar-01 

NATL1A-6 

Red  Sea 

50 

1 3-Sep-00 

0.1 

7-Mar-01 

NATL1A-7 

Red  Sea 

130 

13-Sep-00 

1 

M 

7-Mar-01 

NATL1A-8 

missing 

NATL1A-9 

BATS 

75 

6-Jun-OO 

0.1 

M 

100 

1 6-Oct-OO 

NATL1A-10 

BATS 

75 

6-Jun-OO 

1 

P 

40 

1000 

1 6-Oct-OO 

NATL1A-11 

BATS 

45 

Sep-99 

1 

P 

40 

10 

6- Jan-01 

NATL1A-12 

BATS 

70 

Sep-99 

0.1 

P 

40 

1000 

6- Jan-01 

NATL1A-13 

BATS 

100 

Sep-99 

0.01 

P 

40 

100 

6-Jan-01 

NATL1A-14 

BATS 

100 

Sep-99 

0.01 

P 

40 

1000 

6-Jan-01 

NATL1A-15 

BATS 

120 

Sep-99 

1 

M 

75 

30+ 

25 

30 

6-Jan-01 

NATL2A-1 

BATS 

100 

6-Jun-OO 

0.01 

M 

65 

170 

20 

NATL2A-2 

BATS 

100 

6-Jun-OO 

0.01 

P 

40 

100 

7-Aug-00 

NATL2A-2 

- 

- 

- 

- 

P 

60 

10 

NATL2A-3 

BATS 

100 

6-Jun-OO 

0.01 

M 

75 

85 

15 

1 

7-Aug-00 

plaque  CLONAL 

NATL2A-4 

BATS 

10 

6-Jun-OO 

1 

M 

80 

160 

20 

100 

11 -Aug-00 

plaque  CLONAL 

NATL2A-5 

Red  Sea 

sfc 

15-Jul-00 

1 

M 

60 

110 

12 

5 

4-Sep-00 

PLATES  15  JAN  01 

NATL2A-6 

Red  Sea 

sfc 

15-Jul-OO 

1 

M 

60 

110 

12 

3 

4-Sep-00 

NATL2A-7 

Red  Sea 

sfc 

15-Jul-00 

1 

M 

60 

110 

12 

3 

4-Sep-00 

spilled,  only  -1ml  left 

NATL2A-8 

Red  Sea 

sfc 

15-Jul-OO 

0.01 

8-Sep-00 

NATL2A-9 

Red  Sea 

sfc 

15-Jul-OO 

0.01 

M 

60 

110 

12 

10 

8-Sep-OO 

NEG  STAINED 

NATL2A-10 

BATS 

75 

6-Jun-OO 

0.1 

P? 

60 

150 

8- Sep-00 

probably  myos 

NATL2A-1 1 

BATS 

75 

6-Jun-OO 

0.1 

P? 

60 

few 

13-Sep-00 

probably  myos 

NATL2A-12 

BATS 

75 

6-Jun-OO 

0.1 

M 

60 

110 

12 

3 

13-Sep-00 

NATL2A-13 

BATS 

15 

Sep-99 

0.01 

M 

60 

110 

12 

3 

1 4-Sep-00 

no  photo  but  similar  to 
12 

NATL2A-14 

BATS 

15 

Sep-99 

0.01 

M 

70 

80+ 

20 

3 

1 4-Sep-OO 

CLONAL 

NATL2A-15 

BATS 

15 

Sep-99 

0.1 

P? 

60 

2 

1 4-Sep-OO 

probably  myos 

NATL2A-16 

BATS 

3 

Sep-99 

0.1 

M 

60 

110 

12 

1 

1 4-Sep-00 

NATL2A-17 

BATS 

3 

Sep-99 

0.1 

M 

75 

160 

20 

10 

1 4-Sep-00 
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NATL2A-18 

BATS 

3 

Sep-99 

0.1 

M 

? 

75 

5 

14-Sep-00 

NATL2A-19 

BATS 

40 

Sep- 99 

0.1 

M 

1 

27-Sep-00 

new  stock  (21Jan01), 
CLONAL 

NATL2A-20 

BATS 

40 

Sep-99 

0.1 

?? 

27-Sep-OO 

spilled, none  left, never 
looked  at 

NATL2A-21 

BATS 

40 

Sep-99 

1 

27-Sep-00 

new  stock  (21Jan01) 

NATL2A-22 

BATS 

70 

Sep-99 

0.01 

27-Sep-00 

new  stock  (21Jan01) 

NATL2A-23 

BATS 

70 

Sep-99 

0.1 

27-Sep-OO 

new  stock  (21Jan01) 

NATL2A-24 

BATS 

70 

Sep-99 

1 

P 

40 

10 

27-Sep-OO 

new  stock  (21Jan01) 

NATL2A-25 

BATS 

120 

Sep-99 

0.01 

M 

75 

110 

+ 

20 

500 

27-Sep-OO 

NATL2A-26 

BATS 

120 

Sep-99 

0.1 

M 

? 

75 

50 

27-Sep-OO 

NATL2A-27 

BATS 

120 

Sep-99 

1 

M 

75 

150 

20 

3 

27-Sep-OO 

NATL2A-28 

Red  Sea 

50 

13-Sep-00 

0.001 

M 

75 

30+ 

25 

5 

21 -Oct-OO 

NATL2A-29 

Red  Sea 

50 

13-Sep-00 

0.01 

NOTHING 

21 -Oct-OO 

NATL2A-30 

BATS 

100 

Sep-99 

0.01 

M 

75 

100 

25 

5 

21 -Oct-OO 

CLONAL 

NATL2A-31 

BATS 

100 

Sep-99 

1 

P 

50 

? 

10 

21 -Oct-OO 

CLONAL 

NATL2A-32 

Red  Sea 

5 

13-Sep-00 

0.1 

M 

75 

75+ 

20 

5 

3-Nov-00 

NATL2A-33 

Red  Sea 

5 

1 3-Sep-00 

1 

M 

75 

110 

20 

5 

3-Nov-00 

NATL2A-34 

Red  Sea 

50 

13-Sep-00 

0.001 

M 

75 

30+ 

? 

5 

3-Nov-00 

NATL2A-35 

Red  Sea 

50 

13-Sep-00 

0.01 

M 

75 

30+ 

? 

5 

3- Nov-00 

NATL2A-36 

Red  Sea 

100 

1 3-Sep-00 

0.001 

M 

75 

+ 

30+ 

? 

5 

3- Nov-00 

NATL2A-37 

Red  Sea 

100 

1 3-Sep-00 

0.01 

M 

75 

150 

20 

5 

3-Nov-00 

NATL2A-38 

Red  Sea 

130 

13-Sep-00 

0.01 

P 

40 

1000 

3-Nov-00 

NATL2A-39 

Red  Sea 

130 

1 3-Sep-00 

1 

3-Nov-00 

NATL2A-40 

Red  Sea 

50 

1 3-Sep-00 

plaque 

M 

75 

120 

25 

300 

3- Nov-00 

CLONAL,  great  photos 

NATL2A-41 

Red  Sea 

50 

13-Sep-00 

plaque 

M 

80 

80+ 

25 

30 

3- Nov-00 

CLONAL,  great  photos 

NATL2A-42 

Red  Sea 

50 

13-Sep-00 

plaque 

P 

40 

1000 

3- Nov-00 

CLONAL 

NATL2A-43 

Red  Sea 

50 

13-Sep-00 

plaque 

P 

50 

? 

300 

3-Nov-00 

CLONAL 

NATL2A-44 

Red  Sea 

5 

13-Sep-00 

plaque 

23- Apr-01 

"clear"  plaque 

NATL2A-45 

Red  Sea 

5 

13-Sep-00 

plaque 

23-Apr-01 

"old"  plaque 

NATL2A-46 

Red  Sea 

130 

1 3-Sep-00 

plaque 

23-Apr-01 

NATL2A-47 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-48 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-49 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-50 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-51 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-52 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-53 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-54 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-55 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-56 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 
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NATL2A-57 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-58 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-59 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-60 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-61 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-62 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-63 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-64 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-65 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-66 

Sarg 

0 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-67 

Sarg 

95 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-68 

Sarg 

95 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-69 

Sarg 

95 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-70 

Sarg 

95 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-71 

Sarg 

95 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-72 

Sarg 

95 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-73 

Sarg 

95 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-74 

Sarg 

95 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-75 

Sarg 

95 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-76 

Sarg 

95 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-77 

Sarg 

95 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-78 

Sarg 

95 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-79 

Sarg 

95 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-80 

Sarg 

95 

Sep-2001 

plaque 

Nov/Dec2002 

NATL2A-81 

Sarg 

95 

Sep-2001 

plaque 

Nov/Dec2002 

9107-1 

Red  Sea 

50 

1 3-Sep-00 

plaque 

9201-1 

Red  Sea 

50 

1 3-Sep-00 

plaque 

5-May-01 

CLONAL 

9202-1 

Red  Sea 

50 

1 3-Sep-00 

plaque 

13- Apr-01 

CLONAL 

9202-2 

Red  Sea 

50 

13-Sep-00 

plaque 

5-May-01 

CLONAL 

9301-1 

Red  Sea 

50 

1 3- Sep-00 

plaque 

9311-1 

Red  Sea 

50 

1 3-Sep-00 

plaque 

9314-1 

Red  Sea 

50 

13-Sep-00 

plaque 

13- Apr-01 

CLONAL 

9314-2 

Red  Sea 

50 

13-Sep-00 

plaque 

5-May-01 

CLONAL 

9401-1 

Red  Sea 

50 

1 3-Sep-00 

plaque 

13- Apr-01 

CLONAL 

MED1-1 

Red  Sea 

50 

13-Sep-00 

plaque 

13- Apr-01 

CLONAL 

AS9601-1 

Red  Sea 

50 

1 3-Sep-00 

plaque 

13-Apr-01 

CLONAL 

DV-1 

Red  Sea 

50 

1 3-Sep-00 

plaque 

13- Apr-01 

CLONAL 

GP2-1 

Red  Sea 

50 

13-Sep-00 

plaque 

SP-1 

Red  Sea 

50 

13-Sep-00 

plaque 

6501-1 

Slope 

0 

17-Sep-01 

plaque 

M 

80 

80+ 

25 

28-Nov-01 

CLONAL 

6501-2 

Slope 

15 

17-Sep-01 

plaque 

28-Nov-01 

6501-3 

Slope 

40 

17-Sep-01 

plaque 

28-Nov-01 

6501-4 

Slope 

60 

17-Sep-01 

plaque 

29-Nov-01 
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6501-5 

Shelf 

0 

16-Sep-01 

plaque 

M 

80 

110 

25 

29-Nov-01 

CLONAL 

6501-6 

Shelf 

40 

16-Sep-01 

plaque 

29-Nov-01 

6501-7 

Shelf 

50 

16-Sep-01 

plaque 

29-Nov-01 

6501-8 

Shelf 

27 

16-Sep-01 

plaque 

29-Nov-01 

6501-9 

Sarg 

70 

22-Sep-01 

plaque 

M 

80 

110 

25 

29- Nov-01 

CLONAL,  great  photos 

6501-10 

Sarg 

30 

22-Sep-01 

plaque 

8-Dec-01 

6501-11 

WHOI 

0 

5-Sep-01 

plaque 

4- Feb-02 

7803-1 

Shelf 

0 

16-Sep-01 

plaque 

4- Nov-01 

Big  plaque 

7803-2 

Slope 

0 

17-Sep-01 

plaque 

4- Nov-01 

7803-3 

Sargasso 

0 

22-Sep-01 

plaque 

4-Nov-01 

7803-4 

Slope 

40 

17-Sep-01 

plaque 

4- Nov-01 

7803-5 

Sargasso 

70 

22-Sep-01 

plaque 

4-Nov-01 

7803-6 

Shelf 

50 

16-Sep-01 

plaque 

4- Nov-01 

7803-7 

Slope 

83 

17-Sep-01 

plaque 

4-Nov-01 

7803-8 

Sargasso 

130 

22-Sep-01 

plaque 

4- Nov-01 

7803-9 

Shelf 

0 

16-Sep-01 

plaque 

9-Nov-01 

little  plaque 

7803-10 

Slope 

60 

17-Sep-01 

plaque 

9- Nov-01 

8017-1 

Slope 

15 

17-Sep-01 

plaque 

28-Feb-03 

2xplaque  purified 

8017-2 

Slope 

15 

17-Sep-01 

plaque 

28-Feb-03 

2xplaque  purified 

8018-1 

Shelf 

0 

16-Sep-01 

plaque 

4-Nov-01 

8018-2 

Slope 

0 

17-Sep-01 

plaque 

4- Nov-01 

8018-3 

Sargasso 

0 

22-Sep-01 

plaque 

4- Nov-01 

2xplaque  purified 

8018-4 

Shelf 

40 

16-Sep-01 

plaque 

4- Nov-01 

8018-5 

Slope 

60 

17-Sep-01 

plaque 

4-Nov-OI 

8018-6 

Slope 

15 

17-Sep-01 

plaque 

29-Nov-01 

8018-7 

Sargasso 

50 

22-Sep-01 

plaque 

29-Nov-01 

8018-8 

Sargasso 

110 

22-Sep-01 

plaque 

29-Nov-01 

2xplaque  purified 

8018-9 

Shelf 

50 

16-Sep-01 

plaque 

9- Nov-01 

8101-1 

Slope 

0 

17-Sep-01 

plaque 

9-Nov-01 

8102-1 

Slope 

0 

17-Sep-01 

plaque 

15-Nov-01 

8102-2 

Slope 

40 

17-Sep-01 

plaque 

15-Nov-01 

8102-3 

Slope 

60 

17-Sep-01 

plaque 

15-Nov-01 

8102-4 

Shelf 

0 

16-Sep-01 

plaque 

M 

70 

? 

100 

+ 

20 

+ 

15-Nov-01 

mid-size,  clear  plaque 

8102-5 

Shelf 

0 

16-Sep-01 

plaque 

15-Nov-01 

large  turbid  plaque 

8102-6 

Shelf 

50 

16-Sep-01 

plaque 

15-Nov-01 

8102-7 

Shelf 

40 

16-Sep-01 

plaque 

15-Nov-01 

8102-8 

Sargasso 

0 

22-Sep-01 

plaque 

M 

80 

200 

25 

15-Nov-01 

CLONAL 

8102-9 

Shelf 

27 

16-Sep-01 

plaque 

8-Dec-01 

large  plaque 

8102-10 

Shelf 

27 

16-Sep-01 

plaque 

8-Dec-01 

tiny  plaque 

8102-11 

Sargasso 

0 

22-Sep-01 

plaque 

29-Dec-01 

clear  plaque 

8102-12 

Sargasso 

95 

22-Sep-01 

plaque 

29-Dec-01 

clear  plaque 

8102-13 

Sargasso 

110 

22-Sep-01 

plaque 

29-Dec-01 

clear  plaque 
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8109-1 

WHOI 

0 

5-Sep-01 

plaque 

4-Feb-02 

8109-2 

Sargasso 

70 

22-Sep-01 

plaque 

4- Feb-02 

8109-3 

Sargasso 

95 

22-Sep-01 

plaque 

4- Feb-02 

Syn  1 

from  JW  Jan  2003 

Syn  2 

M 

66 

149 

17 

from  JW  Jan  2003 

Syn  5 

P 

60 

t 

from  JW  Jan  2003 

Syn  9 

M 

87 

153 

19 

from  JW  Jan  2003 

Syn  10 

M 

10 

0 

145 

19 

from  JW  Jan  2003 

Syn  12 

P 

45 

8 

10 

Syn  14 

M 

93 

136 

21 

Syn  19 

M 

60 

from  JW  Jan  2003 

Syn  26 

from  JW  Jan  2003 

Syn  30 

from  JW  Jan  2003 

Syn  33 

S-WHM1 

M 

88 

108 

23 

from  WW  June  2002 

S-PM2 

M 

90 

165 

20 

from  WW  June  2002 
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APPENDIX  B.  PROPHAGE  IN  HOST  GENOMES 

Introduction 

There  are  two  classes  of  phages:  lytic  and  temperate.  Lytic  phages  infect  their  host,  use 
the  host  cellular  processes  to  build  new  phage  particles,  and  then  burst  the  host  cell  releasing 
progeny  particles.  Other  phages  (temperate  phages)  infect  their  hosts  and  temporarily  insert  their 
DNA  into  the  host  genome  as  a  prophage.  Expression  of  prophage  genes  often  fundamentally 
changes  the  host’s  physiology  -  a  process  known  as  lysogenic  conversion  (Calendar,  1988).  This 
is  best  studied  where  prophage  encode  medically  important,  pathogen-associated  toxin  genes 
(Boyd,  Davis,  and  Hochhut,  2001;  Miao  and  Miller,  1999;  Wagner  and  Waldor,  2002).  Many 
prophage-encoded  virulence  factors  confer  advantages  to  the  cells  they  infect.  These  virulence 
factors  range  from  toxins  (e.g.,  diphtheria  toxin,  cholera  toxin)  to  conversion  genes  (e.g., 
superoxide  dismutase,  LPS  conversion)  and  are  often  advantageous  to  the  host  cell,  for  example 
by  allowing  pathogens  to  infect  a  broader  range  of  hosts,  or  by  directly  improving  the  cell’s 
fitness.  In  some  cases,  prophage  are  even  responsible  for  horizontally  transferring  such  toxins 
between  microbial  species  (Banks,  Beres,  and  Musser,  2002). 

How  common  are  prophage  in  the  microbial  world?  Approximately  70%  of  the  56 
completed  bacterial  genome  sequences  surveyed  by  Canchaya  et  al  (2003)  contain  at  least  one 
prophage  >10kb  in  size  (Canchaya  et  al.,  2003).  Many  microbial  genomes  contain  multiple 
prophages:  51  of  82  bacterial  species  surveyed  harbor  230  putative  prophage  (Casjens,  2003). 
Prophage  are  so  common  that  they  often  account  for  a  significant  fraction  of  the  “strain-specific” 
DNA  between  closely  related  microbial  strains  (Baba  et  al.,  2002;  Simpson  et  al.,  2000;  Smoot  et 
al.,  2002).  An  extreme  example  of  such  strain-specificity  being  due  to  prophage  can  be  found  in 
a  comparison  of  the  group  A  Streptococcus  (GAS)  serotype  Ml,  M3  and  M18  genomes.  While 
these  three  strains  share  a  core  set  of  genes  (-90%  =  1.7  Mbp),  almost  all  of  the  strain-specific 
variation  was  attributable  between  4  and  6  phage  and  phage-like  elements  that  accounted  for  130 
kb  (7.1%,  4  phages),  235  kb  (-12.4%,  6  phages),  and  204  kb  (10.8%,  5  phages)  in  the  Ml,  M3 
and  M13  genomes,  respectively  (Beres  et  al.,  2002). 

Prophage  genes  in  microbial  genomes  are  often  highly  expressed  as  seen  using  genome¬ 
wide  expression  arrays  to  evaluate  gene  expression  of  a  GAS  strain  between  two  temperatures 
(Smoot  et  al.,  2001).  Of  the  144  genes  that  were  differentially  expressed  (-9%  of  1605  genes  on 
the  microarray),  the  majority  in  any  one  category  (22)  were  designated  as  from  mobile  genes  and 
phages.  Another  study  compared  free-living  and  biofilm  Pseudomonas  aeruginosa  cells 
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(Whiteley  et  al.,  2001).  Of  only  73  genes  that  were  differentially  expressed  between  these  two 
populations,  the  most  highly  activated  genes  were  from  a  temperate  bacteriophage.  Such 
expression  of  prophage  genes  often  fundamentally  changes  the  host’s  physiology  -  a  process 
known  as  lysogenic  conversion  (Calendar.  1988).  These  changes  have  been  best  studied  where 
prophage  encode  medically  important,  pathogen-associated  toxin  genes,  such  as  cholera  toxin 
(Waldor  and  Mekalanos,  1996),  enterotoxin  A  (Coleman  et  al.,  1989),  streptococcal  pyrogenic 
exotoxin  A  (Johnson,  Schlievert,  and  Watson,  1980)  C  (Goshom,  Bohach,  and  Schlievert,  1988), 
L  and  M  (Smoot  et  al.,  2002). 

Together,  these  findings  emphasize  that  prophage  are  not  only  widespread  in  prokaryotes, 
but  also  frequently  account  for  strain  diversification  at  both  the  genome  and  transcriptome 
(expression)  levels  often  altering  the  host  cell’s  physiology.  However,  the  genomes  of  currently 
available  freshwater  cyanobacterial  genomes  lack  intact  prophage  (Canchaya  et  al.,  2003; 

Casjens,  2003)  and  no  direct  observation  has  confirmed  the  existence  of  prophage  in  marine 
cyanobacteria.  Indirect  measures  from  induction  experiments  in  the  field  suggest  that  temperate 
cyanophage  can  be  induced  from  Synechococcus  strains  (McDaniel  et  al.,  2002;  Ortmann, 
Lawrence,  and  Suttle,  2002),  but  without  evidence  showing  intact  prophage  genomes  integrated 
in  the  host  genome,  plausible  alternative  explanations  can  be  made.  During  this  thesis,  three 
cyanobacterial  genomes  were  sequenced  by  the  Department  of  Energy  Joint  Genome  Institute 
(Palenik  et  al.,  2003;  Rocap  et  al.,  2003).  We  wondered  whether  these  cyanobacterial  hosts 
contained  intact  prophage  and  whether  such  prophage  might  influence  the  metabolic  capacity  of 
their  hosts.  To  this  end,  I  analyzed  these  host  genomes  for  the  presence  of  intact  prophage. 

Methods 

The  search  for  prophages  was  greatly  facilitated  by  conversations  with  Dr.  Sherwood 
Casjens.  At  the  time,  there  were  no  standard  methodologies  for  searching  for  prophages,  but 
there  was  a  growing  belief  that  prophage  were  common  in  microbial  genomes  though  their 
ubiquity  was  little  mentioned  in  the  literature  until  two  recent  reviews  of  prophage  identified  from 
the  plethora  or  microbial  genomes  that  had  become  available  (Canchaya  et  al.,  2003;  Casjens, 
2003). 

I  used  the  following  method  to  find  prophage  (Casjens,  pers.  comm.): 

(1)  Use  BLASTp  searching  (e-value  cut-off  <-3)  to  look  for  phage  genes  within  the  host  genome 
that  clustered  within  -60  kb  of  DNA  sequence 
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(2)  Treat  phage  genes  encoding  the  portal  and  terminase  proteins  as  strong  indicators  of  a 
prophage  due  to  a  lack  of  a  function  in  the  hosts  (termed  “cornerstone”  genes),  while  phage  genes 
such  as  ribonucleotide  reductases  commonly  found  in  phage  are  also  found  in  hosts  so  should  not 
be  used  for  identification  of  a  prophage. 

(3)  In  each  clustering  of  phage-related  genes  within  a  genome,  use  synteny  (order  of  genes) 
arguments  and  iterative  PSI-BLASTing  to  determine  if  other  genes  in  the  surrounding  region 
might  also  be  phage-related  to  build  confidence  in  putative  prophages 

(4)  If  a  convincing  prophage  is  identified,  the  ends  can  be  found  by  looking  for  >10  bp  direct 
repeats  that  act  as  the  site  of  integration 

Results 

Examination  of  the  BLAST  analyses  for  the  Prochlorococcus  MED4,  Prochlorococcus 
MIT9313  and  Synechococcus  WH8102  suggested  that  5  putative  prophage  regions  were  worth 
further  investigation  because  these  regions  contained  clusters  of  3-6  phage-like  genes. 

Putative  prophage  regions  initially  identified  by  clustering  of  putative  phage  genes  were: 

•  In  Prochlorococcus  MED4  (617,000  to  665,000,  includes  ORFs  orl228  to  orl260  and  or0916 
to  or0963;  1,328,000  to  1,367.000.  includes  ORFs  or 1709  to  orl756  and  or0437  to  or0476) 

•  In  Prochlorococcus  MIT  9313  (714,000  to  758,000,  includes  ORFs  or0963  to  orlOOl  and 
or2925  to  or2945) 

•  In  Synechcoccus  WH  8102  (1,127,000  to  1,175,000,  includes  ORFs  or0142  to  orol78  and 
or3452  to  or3418;  1,291,000  to  1,345,000,  includes  ORFs  oi0244  to  or0300  and  or3289  to 
or3335) 

Iterative  PSI-BLASTing  and  synteny  (gene  order)  was  used  to  see  if  further  phage  genes 
could  be  detected  within  these  regions  (Frank  Larimer’s  group  at  ORNL  wrote  a  script  to 
automate  the  PSI-BLASTing  for  3  iterations).  These  analyses  did  not  reveal  any  more  phage 
genes,  but  did  identify  many  “host”  genes  within  all  five  of  these  regions.  Further,  three  of  the 
five  putative  prophage  were  significantly  homologous  to  regions  in  one  or  both  of  the  other 
genomes  which  suggested  these  regions  were  vertically  transferred  from  ancestral  genomes 
None  of  these  putative  prophage  passed  the  remaining  tests  for  identifying  prophage 
(steps  #2,  3, 4  from  methods  section),  so  it  was  concluded  that  there  are  no  prophage  in  these 
genomes  (see  Discussion). 
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Discussion 

While  5  regions  from  the  three  cyanobacterial  genomes  contained  clusters  of  3-6  phage 
genes,  none  of  these  regions  proved  bonafide  prophage  through  subsequent  testing.  There  were 
no  “cornerstone”  phage  genes  in  these  regions  and  there  were  many  “host”  genes  that  suggested 
these  regions  were  not  prophage.  One  caveat  to  these  analyses  is  that  the  sequence-based 
searching  tools  may  be  limited  in  finding  phage  genes  (and  phage  structural  genes  in  particular) 
due  to  the  absence  of  temperate  cyanophage  genes  and  genomes  in  the  databases.  However, 
sequencing  of  three  Prochlorococcus  cyanophage  genomes  by  the  Department  of  Energy  Joint 
Genome  Institute  shows  that  many  cyanophage  genes  are  of  significant  homology  to  other  known 
phage  genes  (Chapter  5),  so  it  is  likely  that  these  results  are  accurate  and  did  not  miss  intact 
prophage  due  to  limitations  in  the  database. 

While  none  of  the  three  marine  cyanobacterial  genomes  examined  contained  intact 
prophage,  it  is  unlikely  that  prophage  do  not  exist  in  these  hosts  in  the  oceans.  The  presence  of 
intact  and  degraded  phage-related  integrases  in  all  three  genomes  (this  Appendix)  suggests  that 
prophage  have  integrated  into  these  genomes  (Palenik  et  al.,  2003).  It  is  plausible  that  even  if 
these  cyanobacterial  isolates  once  contained  prophage,  such  prophage  could  have  been  induced 
during  stressful  culture  conditions  many,  many  stationary  phases  ago  (in  our  batch  culturing  over 
the  past  10+  years). 

Integrase  genes  as  evidence  of  past  prophage  events? 

Integrase  family  recombinases  are  used  by  prophage  and  mobile  elements  to  integrate 
their  genome  into  a  host  genome  (Nunes-Duby  et  al.,  1998;  Williams,  2002).  However,  the 
diversity  of  function  of  these  integrase  genes  is  larger  than  simply  the  function  of  prophage 
integration  -  these  genes  might  also  be  involved  in  conjugative  transposition,  resolution  of 
concatenated  DNA  circles,  regulation  of  plasmid  copy  number,  DNA  excision  to  control  gene 
expression  for  nitrogen  fixation  in  Anabaena  and  DNA  inversions  controlling  expression  of  cell 
surface  proteins  or  DNA  replication  (Nunes-Duby  et  al.,  1998). 

To  determine  if  there  were  phage-related  integrase  genes  in  our  genomes,  I  searched  the 
three  host  genomes  to  find  all  ORFs  with  sequence  homology  to  known  integrases.  I  then 
examined  this  list  of  candidate  integrase  genes  using  Clustal  alignments  against  known  integrases 
and  resolvases  to  identify  a  suite  of  ‘conserved  motifs’  identified  in  a  review  of  105  integrase 
family  recombinases  (Nunes-Duby  et  al.,  1998):  known  conserved  amino  acids  that  are  important 
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for  the  secondary  structure  (R-H-R-Y)  as  well  as  some  less  conserved  sites  ("GT"  and 
"LLGH")(Table  I).  Interestingly  the  one  MED4  integrase  has  an  orthologue  in  MIT9313 
(or 3443)  and  two  MIT9313  ORFs  that  are  an  integrase  or  a  fragment  integrase  are  orthologous  to 
ORFs  in  Syn  (or3056,  or  1261). 

In  spite  of  finding  many  putative  integrase  family  recombinases  (Table  I),  the  regions 
surrounding  these  ORFs  do  not  appear  to  be  part  of  an  intact  prophage  (i.e.,  a  region  containing 
many  phage-related  genes  without  "obviously  non-phage  related  genes”).  Interestingly,  many  of 
these  integrase  genes  appear  fragmented  (MIT  ORFs  or3853,  or2797,  or0731,  or 1796  and  Syn 
ORFs  or0150,  or3298,  or3591,  or3050)  and  a  few  of  these  degenerate  and  putative  integrase 
genes  are  associated  with  a  deviation  in  %GC  and/or  are  next  to  a  tRNA  gene.  Taken  together, 
the  high  number  of  intact  and  fragmented  integrase  genes  is  suggestive  evidence  of  past  prophage 
integration  events  (Palenik  et  al.,  2003). 

The  possible  integrase  gene  in  MED4  (orl025)  may  have  been  part  of  an  ancient 
evolutionary  event  as  it  is  part  of  an  orthologous  region  in  MED4,  MIT9313  and  WH8102  and  its 
closest  sequence  homology  is  to  the  cyanobacterial  xisA  gene.  In  Anabaena  PCC7120,  this  gene 
has  evolved  functions  useful  to  the  host  as  it  acts  as  a  site-directed  recombinase  involved  in 
excision  of  large  DNA  fragments  that  disrupt  the  nitrogenase  gene  cluster  during  differentiation 
of  cells  into  heterocysts  (Apte  and  Prabhavathi,  1994).  Further  analysis  could  be  done  with  this 
alternate  function  in  mind  for  these  recombinases  in  these  marine  cyanobacteria. 


Table  I:  Possible  integrase  genes  in  Prochlorococcus  and  Synechococcus  genomes  as  well  as  in 
the  Prochlorococcus  cyanophage  P-SSP7.  Data  included  here  represent  amino  acid  motifs 
identified  from  the  alignments  of  105  integrase  genes  surveyed  in  (Nunes-Duby  et  al.,  1998);  R- 
H-R-Y  was  specifically  identified  in  this  paper,  while  the  GT  and  LLGH  motifs  were  added  to 
my  search.  Also  included  are  notes  about  surrounding  gene  arrangement  within  each  of  these 
genomes  in  the  "notes  from  genome"  column.  All  of  the  ORFs  that  have  "Y"  in  the  "INT?" 
column  are  related  to  the  INT  family  of  site-specific  recombinases  by  BLAST  homology  and 
have  the  necessary  known  critical  amino  acids  to  be  classified  as  an  INT.  Note  that  not  all  INT 
genes  are  prophage  related. 
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Table  I:  Putative  integrase  genes  in  marine  cyanobacterial  genomes 
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The  marine  unicellular  cyanobacterium  Prochlorococcus  is  the 
smallest-known  oxygen-evolving  autotroph1.  It  numerically 
dominates  the  phytoplankton  in  the  tropical  and  subtropical 
oceans2,3,  and  is  responsible  for  a  significant  fraction  of  global 
photosynthesis.  Here  we  compare  the  genomes  of  two  Prochloro¬ 
coccus  strains  that  span  the  largest  evolutionary  distance  within 
the  Prochlorococcus  lineage4  and  that  have  different  minimum, 
maximum  and  optimal  light  intensities  for  growth5.  The  high¬ 
light-adapted  ecotype  has  the  smallest  genome  (1,657,990  base 
pairs,  1,716  genes)  of  any  known  oxygenic  phototroph,  whereas 
the  genome  of  its  low-light-adapted  counterpart  is  significantly 
larger,  at  2,410,873  base  pairs  (2,275  genes).  The  comparative 
architectures  of  these  two  strains  reveal  dynamic  genomes  that 
are  constantly  changing  in  response  to  myriad  selection  press¬ 
ures.  Although  the  two  strains  have  1,350  genes  in  common,  a 
significant  number  arc  not  shared,  and  these  have  been  differ¬ 
entially  retained  from  the  common  ancestor,  or  acquired  through 
duplication  or  lateral  transfer.  Some  of  these  genes  have  obvious 
roles  in  determining  the  relative  fitness  of  the  ecotypes  in 
response  to  key  environmental  variables,  and  hence  in  regulating 
their  distribution  and  abundance  in  the  oceans. 

As  an  oxyphototroph,  Prochlorococcus  requires  only  light,  CO2 
and  inorganic  nutrients,  thus  the  opportunities  for  extensive  niche 
differentiation  are  not  immediately  obvious — particularly  in  view  of 
the  high  mixing  potential  in  the  marine  environment  (Fig.  la).  Yet 
co-occurring  Prochlorococcus  cells  that  differ  in  their  ribosomal 
DNA  sequence  by  less  than  3%  have  different  optimal  light 
intensities  for  growth6,  pigment  contents7,  light-harvesting  efficien¬ 
cies5,  sensitivities  to  trace  metals*,  nitrogen  usage  abilities9  and 
cyanophage  specificities10  (Fig.  lb,  c).  These  ecotypes’— distinct 
genetic  lineages  with  ecologically  relevant  physiological  differ¬ 
ences — would  be  lumped  together  as  a  single  species  on  the 
basis  of  their  rDNA  similarity",  yet  they  have  markedly  different 
distributions  within  a  stratified  oceanic  water  column,  with  high- 
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Table  1  General  features  of  two  Prochlorococcus  genomes 


Genome  feature 

MED4 

MIT9313 

Length  (bp) 

1.657.990 

2.410.873 

G+C  content  (%) 

30.8 

50.7 

Protein  coding  (%) 

88 

82 

Protein  coding  genes 

1.716 

2.275 

With  assigned  function 

1.134 

1.366 

Conserved  hypothetical 

502 

709 

Hypothetical 

80 

197 

Genes  with  orthologue  in: 

Prochlorococcus  MED4 

_ 

1.352 

Prochlorococcus  MIT9313 

1.352 

Synechococcus  WH8102 

1.394 

1.710 

Genes  without  orthologue  in: 

MEOtand  WH8102 

_ 

527 

MIT9313  and  WH8102 

284 

Transfer  RNA 

37 

43 

Ribosomal  RNA  operons 

1 

2 

Other  structural  RNAs 

3 

3 

light-adapted  ecotypes  most  abundant  in  surface  waters,  and  their 
low-light-adapted  counterparts  dominating  deeper  waters12 
(Fig.  la).  The  detailed  comparison  between  the  genomes  of  two 
Prochlorococcus  ecotypes  we  report  here  reveals  many  of  the  genetic 
foundations  for  the  observed  differences  in  their  physiologies  and 
vertical  niche  partitioning,  and  together  with  the  genome  of  their 
close  relative  Synechococcus' \  helps  to  elucidate  the  key  factors  that 
regulate  species  diversity,  and  the  resulting  biogeochemical  cycles, 
in  today’s  oceans. 

The  genome  of  Prochlorococcus  MED4,  a  high-light-adapted 
strain,  is  1,657,990  base  pairs  (bp).  This  is  the  smallest  of  any 
oxygenic  phototroph— significantly  smaller  than  that  of  the  low- 


light-adapted  strain  MIT9313  (2,410,873  bp;  Table  1).  The  genomes 
of  MED4  and  MIT9313  consist  of  a  single  circular  chromosome 
(Supplementary  Fig.  1),  and  encode  1,716  and  2,275  genes  respect¬ 
ively,  roughly  65%  of  which  can  be  assigned  a  functional  category 
(Supplementary  Fig.  2).  Both  genomes  have  undergone  numerous 
large  and  small-scale  rearrangements  but  they  retain  conservation 
of  local  gene  order  (Fig.  2).  Break  points  between  the  orthologous 
gene  clusters  are  commonly  flanked  by  transfer  RNAs,  suggesting 
that  these  genes  serve  as  loci  for  rearrangements  caused  by  internal 
homologous  recombination  or  phage  integration  events. 

The  strains  have  1,352  genes  in  common,  all  but  38  of  which  are 
also  shared  with  Synechococcus  WH8102  (ref.  13).  Many  of  the  38 
1  Prochlorococcus  -specific’  genes  encode  proteins  involved  in  the 
atypical  light-harvesting  complex  of  Prochlorococcus ,  which  con¬ 
tains  divinyl  chlorophylls  a  and  b  rather  than  the  phycobilisomes 
that  characterize  most  cyanobacteria.  They  include  genes  encoding 
the  chlorophyll  a/fc-binding  proteins  (pcfc)14,  a  putative  chlorophyll 
a  oxygenase,  which  could  synthesize  (divinyl)  chlorophyll  b  from 
(divinyl)  chlorophyll  a'\  and  a  lycopene  epsilon  cyclase  involved  in 
the  synthesis  of  alpha  carotene16.  This  remarkably  low  number  of 
genera  defining’  genes  illustrates  how  differences  in  a  few  gene 
families  can  translate  into  significant  niche  differentiation  among 
closely  related  microbes. 

MED4  has  364  genes  without  an  orthologue  in  MIT9313,  whereas 
MIT9313  has  923  that  are  not  present  in  MED4.  These  strain- 
specific  genes,  which  are  dispersed  throughout  the  chromosome 
(Fig.  2),  clearly  hold  clues  about  the  relative  fitness  of  the  two  strains 
under  different  environmental  conditions.  Almost  half  of  the  923 
MIT93 13 -specific  genes  are  in  fact  present  in  Synechococcus 
WH8102,  suggesting  that  they  have  been  lost  from  MED4  in  the 
course  of  genome  reduction.  Lateral  transfer  events,  perhaps 
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Figure  1  Ecology,  physiology  and  phytogeny  of  Prochlorococcus  ecotypes,  a,  Schematic 
stratified  open-ocean  water  column  illustrating  vertical  gradients  allowing  niche 
differentiation.  Shading  represents  degree  of  light  penetration.  Temperature  and  salinity 
gradients  provide  a  mixing  barrier,  isolating  the  low-nutrient/high-light  surface  layer  from 
the  high-nutrient/tow-light  deep  waters.  Photosynthesis  in  surface  waters  is  driven 


primarily  by  rapidly  regenerated  nutrients,  punctuated  by  episodic  upwelling.  b,  Growth 
rate  (filled  symbols)  and  chlorophyll  b.a  ratio  (open  symbols)  as  a  function  of  growth 
irradiance  for  MED4  (ref.  7)  (green)  and  MIT9313  (ref.  6)  (blue),  c.  Relationships  between 
Prochlorococcus  and  other  cyanobacteria  inferred  using  16S  rONA. 
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mediated  by  phage10,  may  also  be  a  source  of  some  of  the  strain- 
specific  genes  (Supplementary  Figs  3-6). 

Gene  loss  has  played  a  major  role  in  defining  the  Prochlorococciis 
photosynthctic  apparatus.  MED4  and  MIT9313  are  missing  many 
of  the  genes  encoding  phycobilisome  structural  proteins  and 
enzymes  involved  in  phycobilin  biosynthesis15.  Although  some  of 
these  genes  remain,  and  are  functional17,  others  seem  to  be  evolving 
rapidly  within  the  Prochlorococcus  lineage1*.  Selective  genome 
reduction  can  also  be  seen  in  the  photosynthetic  reaction  centre 
of  Prochlorococcus.  Light  acclimation  in  cyanobacteria  often 
involves  differential  expression  of  multiple,  but  distinct,  copies  of 
genes  encoding  photosystem  II  Dl  and  D2  reaction  centre  proteins 
(psbA  and  psbD  respectively)19.  However,  MED4  has  a  single  psbA 
gene,  MIT9313  has  two  that  encode  identical  photosystem  II  Dl 
polypeptides,  and  both  possess  only  one  psbD  gene,  suggesting  a 
diminished  ability  to  photoacclimate.  MED4  has  also  lost  the  gene 
encoding  cytochrome  c550  ( psbV ),  which  has  a  crucial  role  in  the 
oxygen-evolving  complex  in  Synechocystis  PCC6803  (ref.  20). 

There  arc  several  differences  between  the  genomes  that  help 
account  for  the  different  light  optima  of  the  two  strains.  For 
example,  the  smaller  MED4  genome  has  more  than  twice  as  many 
genes  (22  compared  with  9)  encoding  putative  high-light-inducible 
proteins,  which  seem  to  have  arisen  at  least  in  part  through 
duplication  events15.  MED4  also  possesses  a  photolyase  gene  that 
has  been  lost  in  MIT9313,  probably  because  there  is  little  selective 
pressure  to  retain  ultraviolet  damage  repair  in  low  light  habitats. 
Regarding  differences  in  light -harvesting  efficiencies,  it  is  note¬ 
worthy  that  MED4  contains  only  a  single  gene  encoding  the 
chlorophyll  a/6-binding  antenna  protein  Pcb,  whereas  M1T9313 
possesses  two  copies.  The  second  type  has  been  found  exclusively  in 
low-light-adapted  strains21,  and  may  form  an  antenna  capable  of 
binding  more  chlorophyll  pigments. 

Both  strains  have  a  low  proportion  of  genes  involved  in  regulat¬ 
ory  functions.  Compared  with  the  freshwater  cyanobacterium 
Thermosynechococcus  elongatus  (genome  size  <2.6  megabases)22, 
MIT9313  has  fewer  sigma  factors,  transcriptional  regulators  and 
two-component  sensor-kinase  systems,  and  MED4  is  even  more 
reduced  (Supplementary  Table  1).  The  circadian  clock  genes  pro¬ 
vide  an  example  of  this  reduction  as  both  genomes  lack  several 
components  (pex,  kaiA)  found  in  the  model  Synechococcus 
PCC7942  (ref.  23).  However,  genes  for  the  core  clock  proteins 
( kaiB ,  kaiC)  remain  in  both  genomes,  and  Prochlorococcus  cell 
division  is  tightly  synchronized  to  the  diel  light/dark  cycle24. 
Thus,  loss  of  some  circadian  components  may  imply  an  alternative 
signalling  pathway  for  circadian  control. 

Gene  loss  may  also  have  a  role  in  the  lower  percentage  of  G+C 
content  of  MED4  (30.8%)  compared  with  that  of  MIT9313 
(50.74%),  which  is  more  typical  of  marine  Synechococcus.  MED4 
lacks  genes  for  several  DNA  repair  pathways  including  recombina- 
tional  repair  (recj,  recQ)  and  damage  reversal  ( mutT ).  Particularly, 
the  loss  of  the  base  excision  repair  gene  mwfY,  which  removes 
adenosines  incorrectly  paired  with  oxidatively  damaged  guanine 
residues,  may  imply  an  increased  rate  of  G*C  to  T*A  transver- 
sions25.  The  tRNA  complement  of  MED4  is  largely  identical  to 
M1T9313  and  is  not  optimized  for  a  low  percentage  G+C  genome, 
suggesting  that  it  is  not  evolving  as  fast  as  codon  usage. 

Analysis  of  the  nitrogen  acquisition  capabilities  of  the  two  strains 
points  to  a  sequential  decay  in  the  capacity  to  use  nitrate  and  nitrite 
during  the  evolution  of  the  Prochlorococcus  lineage  (Fig.  3a).  In 
Synechococcus  WH8 102— representing  the  presumed  ancestral 
state— many  nitrogen  acquisition  and  assimilation  genes  are 
grouped  together  (Fig.  3a).  MIT9313  has  lost  a  25-gene  cluster, 
which  includes  genes  encoding  the  nitrate/nitrite  transporter  and 
nitrate  reductase.  The  nitrite  reductase  gene  has  been  retained  in 
MIT9313,  but  it  is  flanked  by  a  proteobacterial-like  nitrite  trans¬ 
porter  rather  than  a  typical  cyanobacterial  nitrate/nitrite  permease 
(Supplementary  Fig.  4),  suggesting  acquisition  by  lateral  gene 


transfer.  An  additional  deletion  event  occurred  in  MED4,  in 
which  the  nitrite  reductase  gene  was  also  lost  (Fig.  3a).  As  a  result 
of  these  serial  deletion  events  MIT93I3  cannot  use  nitrate,  and 
MED4  cannot  use  nitrate  or  nitrite9.  Thus  each  Prochlorococcus 
ecotype  uses  the  N  species  that  is  most  prevalent  at  the  light  levels  to 
which  they  are  best  adapted:  ammonium  in  the  surface  waters  and 
nitrite  at  depth  (Fig.  la).  Synechococcus,  which  is  the  only  one  of  the 
three  that  has  nitrate  reductase,  is  able  to  bloom  when  nitrate  is 
upwelled  (Fig.  la),  as  occurs  in  the  spring  in  the  North  Atlantic3  and 
the  north  Red  Sea26. 

The  two  Prochlorococcus  strains  are  also  less  versatile  in  their 
organic  N  usage  capabilities  than  Syttechococcus  WH8102  (ref.  13). 
MED4  contains  the  genes  necessary  for  usage  of  urea,  cyanate  and 
oligopeptides,  but  no  monomeric  amino  acid  transporters  have 
been  identified.  In  contrast,  MIT9313  contains  transporters  for 
urea,  amino  acids  and  oligopeptides  but  lacks  the  genes  necessary 
for  cyanate  usage  (cyanate  transporter  and  cyanate  lyase)  (Fig.  3a). 
As  expected,  both  genomes  contain  the  high-affinity  ammonium 
transporter  amtl  and  both  lack  the  nitrogenase  genes  essential  for 
nitrogen  fixation.  Finally,  both  contain  the  nitrogen  transcriptional 
regulator  encoded  by  ntcA  and  there  are  numerous  genes  in  both 
genomes,  including  ntcA,  amtl,  the  urea  transport  and  GS/GOGAT 
genes  (glutamine  synthetase  and  glutamate  synthase,  both  involved 
in  ammonia  assimilation),  with  an  upstream  NtcA-binding-site 
consensus  sequence. 

The  genomes  also  have  differences  in  genes  involved  in  phos¬ 
phorus  usage  that  have  obvious  ecological  implications.  MED4, 
but  not  MIT9313,  is  capable  of  growth  on  organic  P  sources 
(L.  R.  Moore  and  S.W.C.,  unpublished  data),  and  organic  P  can 
be  the  prevalent  form  of  P  in  high-light  surface  waters27.  This 
difference  may  be  due  to  the  acquisition  of  an  alkaline  phosphatase¬ 
like  gene  in  MED4  (Supplementary  Fig.  5).  Both  genomes  contain 
the  high -affinity  phosphate  transport  system  encoded  by  pstS  and 
pstABC2*,  but  MIT9313  contains  an  additional  copy  of  the  phos¬ 
phate-binding  component  pstS,  perhaps  reflecting  an  increased 
reliance  on  orthophosphate  in  deeper  waters.  MED4  contains 


Figure  2  Global  genome  alignment  as  seen  trom  start  positions  of  orthologous  genes 
Genes  present  in  one  genome  but  not  the  other  are  shown  on  the  axes.  The  broken  X’ 
pattern  has  been  noted  before  for  closely  related  bacterial  genomes,  and  is  probably  due 
to  multiple  inversions  centred  around  the  origin  of  replication.  Alternating  slopes  of  many 
adjacent  gene  clusters  indicate  that  multiple  smaller -scale  inversions  have  also  occurred 
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several  P-related  regulatory  genes  including  the  phoB ,  phoR  two- 
component  system  and  the  transcriptional  activator  ptrA.  In 
MIT9313,  however,  phoR  is  interrupted  by  two  frameshifts  and 
ptrA  is  further  degenerated,  suggesting  that  this  strain  has  lost  the 
ability  to  regulate  gene  expression  in  response  to  changing  P  levels. 

Both  Prochlorococcus  strains  have  iron -related  genes  that  are 
missing  in  Synechococcus  WH8102,  which  may  explain  its  domi¬ 
nance  in  the  iron-limited  equatorial  Pacific2.  These  genes  include 
flavodoxin  (ist'B),  an  Fe-free  electron  transfer  protein  capable  of 
replacing  ferredoxin,  and  ferritin  (located  with  the  ATPase  com¬ 
ponent  of  an  iron  ABC  transporter),  an  iron-binding  molecule 
implicated  in  iron  storage.  Additional  characteristics  of  the  iron 
acquisition  system  in  these  genomes  include:  an  Fe- induced  tran¬ 
scriptional  regulator  (Fur)  that  represses  iron  uptake  genes;  numer¬ 
ous  genes  with  an  upstream  putative  fur  box  motif  that  are 
candidates  for  a  high-affinity  iron  scavenging  system;  and  absence 
of  genes  involved  in  Fe-siderophore  complexes. 

Prochlorococcus  does  not  use  typical  cyanobacterial  genes  for 
inorganic  carbon  concentration  or  fixation.  Both  genomes  contain 
a  sodium/bicarbonate  symporter  but  lack  homologues  to  known 


families  of  carbonic  anhydrases,  suggesting  that  an  as  yet  unidenti 
fied  gene  is  fulfilling  this  function.  One  of  the  two  carbonic 
anhydrases  in  Synechococcus  WH8102  was  lost  in  the  deletion 
event  that  led  to  the  loss  of  the  nitrate  reductase  (Fig.  3a);  the 
other  is  located  next  to  a  tRNA  and  seems  to  have  been  lost  during  a 
genome  rearrangement  event.  Similar  to  other  Prochlorococcus  and 
marine  Synechococcus ,  MED4  and  MIT9313  possess  a  form  IA 
ribulose-l,5-bisphosphate  carboxylase/oxygenase,  rather  than 
the  typical  cyanobacterial  form  IB.  The  ribulose- 1,5-bisphosphate 
carboxylase/oxygenase  genes  are  adjacent  to  genes  encoding  struc¬ 
tural  carboxysome  shell  proteins  and  all  have  phylogenetic  affinity 
to  genes  in  the  *Y-proteobacterium  Acidithiobacillus  ferroxidans", 
suggesting  lateral  transfer  of  the  extended  operon. 

Prochlorococcus  has  been  identified  in  deep  suboxic  zones  where  it 
is  unlikely  that  they  can  sustain  themselves  by  photosynthesis 
alone29,  thus  we  looked  for  genomic  evidence  of  heterotrophic 
capability.  Indeed,  the  presence  of  oligopeptide  transporters  in 
both  genomes,  and  the  larger  proportion  of  transporters  (including 
some  sugar  transporters)  in  the  MIT9313  strain -specific  genes 
(Supplementary  Fig.  2),  suggests  the  potential  for  partial  hetero- 
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Figure  3  Dynamic  architecture  of  marine  cyanobacterial  genomes,  a,  Deletion, 
acquisition  and  rearrangement  of  nitrogen  usage  genes.  In  MU9313. 25  genes  including 
the  nitrate/nitrite  transporter  ( nrtP/napA) ,  nitrate  reductase  (narB)  and  carbonic 
anhydrase  have  been  deleted.  The  cyanate  transporter  and  cyanate  lyase  ( cynS)  were 
probably  lost  after  the  divergence  of  MIT9313  from  the  rest  of  the  Prochlorococcus 
lineage,  as  MED4  possesses  these  genes.  MIT9313  has  retained  nitrite  reductase  ( nir A 
and  acquired  a  nitrite  transporter.  In  MED4  nirA  has  been  lost  and  the  urea  transporter  {urt 


cluster)  and  urease  (ure  cluster)  genes  have  been  rearranged  (dotted  line).  Genes  in 
different  functional  categories  are  colour-coded  to  guide  the  eye  b,  Lateral  transfer  of 
genes  involved  in  lipopolysaccharide  biosynthesis  including  sugar  transferases,  sugar 
epimerases,  modifying  enzymes  and  two  pairs  of  ABC-type  transporters.  Blue,  genes  in  all 
three  genomes;  pink,  genes  hypothesized  to  have  been  laterally  transferred;  red,  tfiNAs, 
white,  other  genes  The  percentage  of  G  +  C  content  in  MIT9313  along  this  segment  is 
lower  (42%)  than  the  whole -genome  average  (horizontal  line). 
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trophy.  However,  neither  genome  contains  known  pathways  that 
would  allow  for  complete  heterotrophy.  They  are  both  missing 
genes  for  steps  in  the  tricarboxylic  acid  cycle,  including  2-oxoglu- 
tarate  dehydrogenase,  succinyl  CoA  synthetase  and  succinyl-CoA- 
acetoacetate-CoA  transferase. 

Cell  surface  chemistry  has  a  major  role  in  phage  recognition  and 
grazing  by  protists  and  thus  is  probably  under  intense  selective 
pressure  in  nature.  The  two  Prochlorococcus  genomes  and  the 
Synechococcus  WH8102  genome  show  evidence  of  extensive  lateral 
gene  transfer  and  deletion  events  of  genes  involved  in  lipopoly- 
saccharide  and/or  surface  polysaccharide  biosynthesis,  reinforcing 
the  role  of  predation  pressures  in  the  creation  and  maintenance  of 
microdiversity.  For  example,  MIT9313  has  a  41.8-kilobase  (kb) 
cluster  of  surface  polysaccharide  genes  (Fig.  3b),  which  has  a  lower 
percentage  G+C  composition  (42%)  than  the  genome  as  a  whole, 
implicating  acquisition  by  lateral  gene  transfer.  MED4  has  acquired 
a  74.5-kb  cluster  consisting  of  67  potential  surface  polysaccharide 
genes  (Supplementary  Fig.  6a)  and  has  lost  another  cluster  of 
surface  polysaccharide  biosynthesis  genes  shared  between 
MIT9313  and  Synechococcus  WHS  102  (Supplementary  Fig.  6b). 

The  approach  wc  have  taken  in  describing  these  genomes  high¬ 
lights  the  known  drivers  of  niche  partitioning  of  these  closely  related 
organisms  (Fig.  1).  Detailed  comparisons  with  the  genomes  of 
additional  strains,  such  as  Prochlorococcus  SSI 20  (ref.  30),  will 
enrich  this  story,  and  the  analysis  of  whole  genomes  from  in  situ 
populations  will  be  necessary  to  understand  the  full  expanse  of 
genomic  diversity  in  this  group.  The  genes  of  unknown  function  in 
all  of  these  genomes  hold  important  clues  for  undiscovered  niche 
dimensions  in  the  marine  pelagic  zone.  As  we  unveil  their  function 
we  will  undoubtedly  learn  that  the  suite  of  selective  pressures  that 
shape  these  communities  is  much  larger  than  we  have  imagined. 
Finally,  it  may  be  useful  to  view  Prochlorococcus  and  Synechococcus 
as  important  ‘minimal  life  units’,  as  the  information  in  their  roughly 
2,000  genes  is  sufficient  to  create  globally  abundant  biomass  from 
solar  energy  and  inorganic  compounds.  □ 

Methods 

Genome  sequencing  and  assembly 

DNA  was  isolated  from  the  clonal,  axenic  strain  MED4  and  the  clonal  strain  MIT9313 
essentially  as  described  previously4.  The  two  whole-genome  shotgun  libraries  were 
obtained  by  fragmenting  genomic  DNA  using  mechanical  shearing  and  cloning  2-3-kb 
fragments  into  pUCI8.  Double-ended  plasmid  sequencing  reactions  were  carried  out 
using  PE  BigDye  Terminator  chemistry  (Perkin  Elmer)  and  sequencing  ladders  were 
resolved  on  PE  377  Automated  DNA  Sequencers  (Perkin  Elmer).  The  whole-genome 
sequence  of  Prochlorococcus  MED4  was  obtained  from  27,065  end  sequences  (7.3-fold 
redundancy),  whereas  Prochlorococcus  MIT9313  was  sequenced  to  X6.2  coverage  (33,383 
end  sequences).  For  Prochlorococcus  MIT93I3,  supplemental  sequencing  (X0.05  sequence 
coverage)  of  a  pFosI  fosmid  library  was  used  as  a  scaffold.  Sequence  assembly  was 
accomplished  using  PHRAP  (P.  Green).  All  gaps  were  closed  by  primer  walking  on  gap- 
spanning  library  clones  or  PCR  products.  The  final  assembly  of  Prochlorococcus  MED4  was 
verified  by  long-range  genomic  PCR  reactions,  whereas  the  assembly  of  Prochlorococcus 
MITV3I3  was  confirmed  by  comparison  to  the  fosmid  clones,  which  were  fingerprinted 
with  EcoRI.  No  plasmids  were  detected  in  the  course  of  genome  sequencing,  and  insertion 
sequences,  repeated  elements,  transposons  and  prophages  are  notably  absent  from  both 
genomes.  The  likely  origin  of  replication  in  each  genome  was  identified  based  on  G+C 
skew,  and  base  pair  I  was  designated  adjacent  to  the  dnaN  gene. 

Genome  annotation 

The  combination  of  three  gene-modelling  programs,  Critica,  Glimmer  and  Generation, 
were  used  in  the  determination  of  potential  open  reading  frames  and  were  checked 
manually.  A  revised  gene/protein  set  was  searched  against  the  KEGG  GENES,  Pfam, 
PROSITE,  PRINTS,  Pro  Dorn,  COGs  and  CyanoBase  databases,  in  addition  to  BLASTP 
against  the  non-redundant  peptide  sequence  database  from  GenBank.  From  these  results, 
categorizations  were  developed  using  the  KEGG  and  COGs  hierarchies,  as  modified  in 
CyanoBase.  Manual  annotation  of  open  reading  frames  was  done  in  conjunction  with  the 
Synechococcus  team.  The  three-way  genome  comparison  was  used  to  refine  predicted  start 
sites,  add  additional  open  reading  frames  and  standardize  the  annotation  across  the  three 
genomes. 

Genome  comparisons 

The  comparative  genome  architecture  of  MED4  and  M1T9313  was  visualized  using  the 
Artemis  Comparison  Tool  (http://www.sanger.ac.uk/Software/ACn7).  Orthologues  were 
determined  by  aligning  the  predicted  coding  sequences  of  each  gene  with  the  coding 
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sequences  of  the  other  genome  using  BLASTP.  Genes  were  considered  orthologues  if  each 
was  the  best  hit  of  the  other  one  and  both  e-values  were  less  than  c~ ,0.  In  addition, 
bidirectional  best  hits  with  e-values  less  than  e  “  and  small  proteins  of  conserved  function 
were  manually  examined  and  added  to  the  orthologuc  lists. 

Phylogenetic  analyses  used  PAUP*.  logdet  distances  and  minimum  evolution  as  the 
objective  function.  The  degree  of  support  at  each  node  was  evaluated  using  1,000 
bootstrap  resamplings.  Ribosomal  DNA  analyses  used  1 .160  positions.  The  Gram-positive 
bacterium  Arthrobactcr  globiformis  was  used  to  root  the  tree. 
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Prochlorococcus  is  the  numerically  dominant  phototroph  in  the 
tropical  and  subtropical  oceans,  accounting  for  half  of  the 
photosynthetic  biomass  in  some  areas ’■*.  Here  we  report  the 
isolation  of  cyanophages  that  infect  Prochlorococcus ,  and  show 
that  although  some  are  host-strain-spccific,  others  cross-mfect 
with  closely  related  marine  Synechococcus  as  well  as  between 
high-light-  and  low-light-adapted  Prochlorococcus  isolates, 
suggesting  a  mechanism  for  horizontal  gene  transfer.  High- 
light-adapted  Prochlorococcus  hosts  yielded  Podoviridae  exclu¬ 
sively,  which  were  extremely  host-specific,  whereas  low-light- 
adapted  Prochlorococcus  and  all  strains  of  Synechococcus  yielded 
primarily  Myoviridae ,  which  has  a  broad  host  range.  Finally,  both 
Prochlorococcus  and  Synechococcus  strain-specific  cyanophagc 
titres  were  low  (<103ml-1)  in  stratified  oligotrophic  waters 
even  where  total  cyanobacteria!  abundances  were  high  (>105 
cells  ml-1).  These  low  titres  in  areas  of  high  total  host  cell 
abundance  seem  to  be  a  feature  of  open  ocean  ecosystems.  We 
hypothesize  that  gradients  in  cyanobacterial  population  diver¬ 
sity,  growth  rates,  and/or  the  incidence  of  lysogeny  underlie  these 
trends. 

Phages  are  thought  to  evolve  by  the  exchange  of  genes  drawn  from 
a  common  gene  pool  through  differential  access  imposed  by  host 
range  limitations5.  Similarly,  horizontal  gene  transfer,  important  in 
microbial  evolution4,5,  can  be  mediated  by  phages6  and  is  probably 
responsible  for  many  of  the  differences  in  the  genomes  of  closely 
related  microbes5.  Recent  detailed  analyses  of  molecular  phylogenies 
constructed  for  marine  Prochlorococcus  and  Synechococcus 7*  (Fig.  1) 
show  that  these  genera  form  a  single  group  within  the  marine 
picophytoplankton  dadew  (>96%  identity  in  16S  ribosomal  DNA 
sequences),  yet  display  microdiversity  in  the  form  of  ten  well-defined 
subgroups*.  We  have  used  members  of  these  two  groups  to  study 
whether  phage  isolated  on  a  particular  host  strain  cross- infect  other 
hosts,  and  if  so,  whether  the  probability  of  cross-infection  is  related 
to  rDNA-based  evolutionary  distance  between  the  hosts. 


Analyses  of  host  range  were  conducted  (Fig.  1)  with  44  cyano¬ 
phages,  isolated  as  previously  described10  from  a  variety  of  water 
depths  and  locations  (see  Supplementary  Information)  using  20 
different  host  strains  chosen  to  represent  the  genetic  diversity  of 
Prochlorococcus  and  Synechococcus *.  Although  wc  did  not  examine 
how  these  patterns  would  change  if  phage  were  propagated  on 
different  hosts,  this  would  undoubtedly  add  another  layer  of 
complexity  due  to  host  range  modifications  as  a  result  of  methyl- 
ation  of  phage  DNA6.  Similar  to  those  that  infect  other  marine 
bacteria"  and  Synechococcuslo~'* ,  our  Prochlorococcus  cyanophage 
isolates  fell  into  three  morphological  families:  Myoviridae ,  Sipho- 
viridae  and  Podoviridaeli. 

As  would  be  predicted10"14,  Podoviridae  were  extremely  host 
specific  with  only  two  cross-infections  out  of  a  possible  300 
(Fig.  1).  Similarly,  the  two  Siphoviridae  isolated  were  specific  to 
their  hosts.  In  instances  of  extreme  host  specificity,  in  situ  host 
abundance  would  need  to  be  high  enough  to  facilitate  phage-host 
contact.  It  is  noteworthy  in  this  regard  that  members  of  the  high¬ 
light-adapted  Prochlorococcus  cluster,  which  yielded  the  most  host- 
spccific  cyanophage,  have  high  relative  abundances  in  situ l6.  The 
Myoviridae  exhibited  much  broader  host  ranges,  with  102  cross¬ 
infections  out  of  a  possible  539.  They  not  only  cross- infected  among 
and  between  Prochlorococcus  ecotypes  but  also  between  Prochloro¬ 
coccus  and  Synechococcus.  Those  isolated  with  Synechococcus  host 
strains  have  broader  host  ranges  and  are  more  likely  to  cross- in  feet 
low-light-adapted  than  high -light -adapted  Prochlorococcus  strains. 
The  low-light -adapted  Prochlorococcus  are  less  diverged  from  Sync- 
chococcus  than  high -light-adapted  Prochlorococcus7  •*,  suggesting  a 
relationship,  in  this  instance,  between  the  probability  of  cross¬ 
infection  and  rDNA  relatedness  of  hosts.  Finally,  we  tested  the 
Myoviridae  for  cross- infection  against  marine  bacterial  isolates 
closely  related  to  Pseudoalteromonas,  which  are  known  to  be  broadly 
susceptible  to  diverse  bacteriophages  (bacterial  strains  HER  1320, 
HER1321,  HER1327,  HER1328)".  None  of  the  Myoviridae  cyano¬ 
phages  infected  these  bacteria. 

Phage  morphotypes  isolated  were  determined,  to  some  degree,  by 
the  host  used  for  isolation  (Fig.  1).  For  example,  ten  of  ten 
cyanophages  isolated  using  high-light-adapted  Prochlorococcus 
strains  were  Podoviridae.  In  contrast,  all  but  two  cyanophages 
isolated  on  Synechococcus  were  Myoviridae,  a  bias  that  has  been 
reported  by  others",  and  over  half  of  those  isolated  on  low-light- 
adapted  Prochlorococcus  belonged  to  this  morphotype.  We  further 
substantiated  these  trends  by  examining  lysates  (as  opposed  to 
plaque- purified  isolates)  from  a  range  of  host  strains,  geographic 
locations  and  depths— of  58  Synechococcus  lysates  93%  contained 
Myoviridaet  of  43  low-light-adapted  Prochlorococcus  lysates  65% 
contained  Myoviridae ,  and  of  107  high-light-adapted  Prochloro¬ 
coccus  lysates  98%  contained  Podoviridae  (sec  Supplementary 
Information). 

Maximum  cyanophage  titres,  using  a  variety  of  Synechococcus 
hosts,  are  usually  found  to  be  within  an  order  of  magnitude  of  the 
total  Synechococcus  abundance,0,,4’l7’,l\  and  can  be  as  high  as  106 
phage  ml-1.  One  study17  has  shown,  for  example,  that  along  a 
transect  in  which  total  Synechococcus  abundance  decreased  from 
105  cells  ml" 1  to  250  cells  ml-1,  maximum  cyanophage  titres 
remained  at  least  as  high  as  the  total  number  of  Synechococcus. 
We  wondered  whether  titres  of  Prochlorococcus  cyanophage  in  the 
Sargasso  Sea,  where  Prochlorococcus  cells  are  abundant  (105 
cells  ml-1),  would  be  comparable  to  those  measured  in  coastal 
oceans  for  Synechococcus  where  total  Synechococcus  host  abundances 
are  of  similar  magnitude.  We  assayed  cyanophage  titres  in  a  depth 
profile  in  the  Sargasso  Sea  at  the  end  of  seasonal  stratification  using 
1 1  strains  of  Prochlorococcus  (Fig.  2),  choosing  at  least  one  host 
strain  from  each  of  the  six  phylogenetic  clusters  that  span  the 
rDNA-based  genetic  diversity  of  our  culture  collection". 

Three  Prochlorococcus  host  strains  (MIT  9303,  MIT  9313  and 
SSI 20)  yielded  low  or  no  cyanophage.  Other  hosts  yielded  titres 
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Chisholm  Supplementary  Figure  1 
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Supp.  Figure  1.  Circular  representation  of  the  Prochlorococcus  genomes,  a, 
MED4.  b,  MIT  9313.  For  both  genomes  outermost  circles  (1  and  2)  are 
predicted  protein  coding  regions  on  the  plus  and  minus  strands,  respectively. 
Color  coding  is  as  in  Supplementary  Figure  2.  The  next  two  circles  show  genes 
not  present  in  the  other  Prochlorococcus  genome  on  the  plus  (circle  3)  and 
minus  (circle  4)  strands.  Circles  5  and  6  show  genes  on  the  plus  and  minus 
strands,  respectively  that  contain  transmembrane  domains.  Circle  7  is  %  G+C 
content  (deviation  from  average).  Innermost  circle  (8)  represents  the  GC  skew 
curve. 
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Chisholm  Supplementary  Figure  2 
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Supp.  Figure  2.  Functional  categorization  of  predicted  open  reading  frames  in 
the  Prochlorococcus  genomes,  following  the  classification  scheme  used  by 
CyanoBase.  a,  MED4,  entire  genome,  b,  MIT  9313,  entire  genome.  C,  Genes 
present  in  both  MED4  and  MIT  9313.  d,  Genes  in  MED4  not  present  in  MIT 
9313.  e,  Genes  in  MIT  9313  not  present  in  MED4. 
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Chisholm  Supplementary  Figure  3 
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Supp.  Figure  3.  Comparison  of  Prochlorococcus  MED4  and  MIT  9313  open 
reading  frames  with  those  of  other  complete  prokaryotic  genomes.  The 
predicted  coding  sequences  of  each  gene  in  both  genomes  were  aligned  with 
the  coding  sequences  of  90  bacterial  genomes  using  BLASTP.  Significant 
alignments  were  defined  as  having  an  e-value  less  than  10-6.  The  bacterial 
genomes  comprised  the  89  completed  bacterial  genomes  available  from 
ftp.ncbi.nih.gov/genbank/genomes/Bacteria  on  30  October  2002  and 
Synechococcus  WH  8102s.  a,  MED4,  entire  genome.  B,  MIT  9313,  entire 
genome,  c,  MED4  genes  present  in  MIT  9313  c,  MIT  9313  genes  present  in 
MED4  e,  Genes  in  MED4  not  present  in  MIT  9313  f,  Genes  in  MIT  9313  not 
present  in  MED4. 
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Chisholm  Supplementary  Figure  4 
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Supp.  Figure  4  Alignment  of  the  putative  nitrite  transporter  in  Prochlorococcus 
MIT9313  (PMT2240)  with  its  most  significant  matches  in  the  NR  database  (all 
proteobacteria)  and  with  cyanobacterial  nitrate/nitrate  transporters.  The  MIT 
9313  gene  has  a  formate/nitrite  transporter  domain  (Pfam  PF01226)  in  contrast 
to  the  cyanobacterial  nitrate  transporters  which  are  permeases  of  the  major 
facilitator  superfamily  (Pfam  PF00083).  Furthermore,  the  MIT  9313  gene  has  no 
significant  matches  (BLASTP  evalue  <  e-2)  in  the  genomes  of  Prochlorococcus 
MED4,  Synechococcus  WH8102,  Synechocystis  sp.  PCC  6803, 
Thermosynechococcus  elongatus  BP-1,  or  Anabaena  sp.  PCC  7120  suggesting 
it  may  have  been  acquired  via  lateral  gene  transfer.  Alignment  generated  using 
ClustalW.  Shaded  residues  indicate  >50%  similarity.  Abbreviations  and 
accession  numbers  as  follows:  Rhodopseud.,  Rhodopseudomonas  palustris 
(ZP_00012718.1  );  Bradyrhiz.,  Bradyrhizobium  japonicum  (UP_769A^)]  Vibrio, 
Vibrio  vulnificus  (NP_762336.1);  Nitros.,  Nitrosomonas  europaea  (NP_840759); 
WH  7803,  Synechococcus  WH  7803  napA  (AAG45172);  PCC  7002, 
Synechococcus  PCC  7002  nrtP  (AAD45941);  WH9601,  Trichodesmium  WH 
9601  napA  (AAF00917);  PCC  73102,  Nostoc  punctiforme  PCC  73102 
(ZP_00 107423). 
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Chisholm  Supplementary  Figure  5 


Agrobacterium  tumefaciens 
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Supp.  Figure  5  Phylogenetic  tree  showing  the  relationship  of  a  possible 
alkaline  phosphatase  like  gene  in  Prochlorococcus  MED4  (PMM0708)  with  the 
most  significant  matches  in  the  NR  database,  which  include  several 
proteobacterial  sequences,  and  with  the  atypical  alkaline  phosphatase  of 
Synechococcus  PCC  7942  and  related  cyanobacterial  genes.  Accession 
numbers  as  follows:  Brucella  melitensis  (NP_541633.1),  Agrobacterium 
tumefaciens  str.  C58  (NP_531 956.1);  Sinorhizobium  meliloti,  (NP_385365.1); 
Vibrio  vulnificus  (NP_762849.1),  Streptomyces  coelicolor  A3(2)  (NP_624650.1), 
Shewanella  oneidensis  MR-1  (NP_717877.1)  Anabaena  PCC  7102 
(NP_489331.1),  Synechocystis  sp.  PCC  6803  (NP_440276);  Synechococcus 
sp.  PCC  7942  (A47026). 
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Chisholm  Supplementary  Figure  6 
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Supp.  Figure  6  Insertions,  deletions  and  rearrangements  of  genes  involved  in 
lipopolysaccharide  biosynthesis  (LPS  clusters)  in  MED4.  Color  coding  is  as 
follows:  blue,  orthologous  genes  present  in  all  three  genomes;  pink,  genes 
hypothesized  to  be  part  of  lateral  transfer  events,  many  have  roles  in  LPS 
biosynthesis;  red,  tRNAs;  green,  orthologous  genes  present  in  two  genomes, 
many  have  roles  in  LPS  biosynthesis;  white,  other  genes.  Length  in  bp 
represents  the  size  of  the  region  shown  for  each  genome,  a,  Insertion  of  a  74.5 
kbp  cluster  of  LPS  genes  in  MED4,  roughly  between  two  tRNAs.  The  67 
potential  surface  polysaccharide  genes  in  this  cluster  include  sugar 
transferases,  sugar  epimerases,  and  modifying  enzymes  such  as 
aminotransferases,  methyltransferases,  carbamoyltransferases,  and 
acetyltransferases.  In  MIT  9313  and  WH  8102  the  genes  that  flank  this  insertion 
are  rearranged  to  other  parts  of  the  genome,  b,  Deletion  of  LPS  biosynthesis 
genes  in  MED4.  LPS  related  genes  present  in  MIT  9313  and  WH  8102,  several 
of  which  have  homologs  in  the  acquired  genes  shown  in  part  a,  have  been 
deleted.  In  this  region  a  selenophosphate  synthase  (selD)  and  a  tRNA 
nucleotidyl-transferase  in  the  center  of  the  cluster  have  been  retained 
suggesting  that  they  are  essential  genes  and  separate  deletion  events  have 
occurred  on  either  side  of  them. 
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Supp.  Table  1  Number  of  predicted  signal  transduction  and  transcription  factors 
suggests  reduced  regulatory  capacity  in  Prochlorococcus 


Sigma  Factors 
Two  Component  systems 
Histidine  Kinases 
Response  regulators 
Ser/Thr  protein  Kinases 
Transcription  Factors 
LuxR  family 
LysR  family 
CRP  family 
ArsR  family 
FUR  family 
Other 

Light  sensors/transducers 

Cryptochrome 

Bacteriophytochrome 

Phototropin 


MED4  MIT  9313  T.  elongatus 
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Marine  phage  genomics 
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Abstract 

Marine  phages  are  the  most  abundant  biological  entities  in  the  oceans.  They  play  important  roles  in  carbon  cycling 
through  marine  food  webs,  gene  transfer  by  transduction  and  conversion  of  hosts  by  lysogeny.  The  handful  of  marine 
phage  genomes  that  have  been  sequenced  to  date,  along  with  prophages  in  marine  bacterial  genomes,  and  partial 
sequencing  of  uncultivated  phages  are  yielding  glimpses  of  the  tremendous  diversity  and  physiological  potential  of  the 
marine  phage  community.  Common  gene  modules  in  diverse  phages  are  providing  the  information  necessary  to  make 
evolutionary  comparisons.  Finally,  deciphering  phage  genomes  is  providing  clues  about  the  adaptive  response  of  phages 
and  their  hosts  to  environmental  cues. 

©  2002  Elsevier  Science  Inc.  All  rights  reserved. 

Keywords:  Phages;  Genomics;  Lysogeny;  Roseophage;  Synechococcus ;  Prochlorococcus 


1.  Introduction 

Direct  counts  show  that  there  are  —  3— 10  virus- 
like  particles  for  every  cell  in  the  marine  environ¬ 
ment  (Bergh  et  al.,  1989;  reviewed  in  Wommack 
and  Colwell,  2000  and  Fuhrman,  1999).  Bacteria 
and  Archaea  are  the  most  common  cells  in  sea¬ 
water,  and  it  is  believed  that  most  of  the  viral-like 
particles  are  phages  that  prey  upon  these  prokary¬ 
otes.  Since  the  oceans  are  the  world’s  largest 
biosphere,  marine  phages  are  probably  the  most 
abundant  biological  entities  on  the  planet.  Through 
their  lytic  activities,  phages  modulate  carbon  flow 
through  microbial  food  webs  by  attacking  both 

*  Contribution  to  a  special  issue  of  CBP  on  Comparative 
Functional  Genomics. 

•Corresponding  author.  Tel.:  + 1-727-553-1168;  fax:  + 1- 
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autotrophic  and  heterotrophic  microbes  (reviewed 
in  Fuhrman,  1999).  As  prophages,  marine  phages 
may  also  confer  a  wide  range  of  traits  to  their 
hosts  including:  immunity  to  superinfection  (Her- 
shey,  1971);  toxin  production  (Waldor  and  Meka- 
lanos,  1996);  and  the  capability  to  transfer  modular 
blocks  of  genes  (Jiang  and  Paul,  1998a;  Paul, 
1999). 

Even  though  The  Age  of  Genomics*  was  her¬ 
alded  by  the  sequencing  of  phage  Escherichia  coli 
phi  174  in  1977  (Sanger  et  al.,  1977),  there  are 
only  three  completed  marine  phage  genomes  cur¬ 
rently  in  GenBank.  This  number  will  undoubtedly 
increase  over  the  next  decade.  Here,  we  review 
the  current  state  of  the  field  of  marine  phage 
genomics  and  argue  that  these  genomes,  because 
of  their  small  size,  offer  unprecedented  opportu¬ 
nities  for  exploring  eco-genomics,  testing  evolu- 
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Classical  Marine  Food  Web 


Marine  Microbial  Food  Web 


Fig.  1.  Phage  and  the  cycling  of  organic  carbon  matter  through  marine  food  webs.  Arrows  indicate  direction  of  organic  carbon  flow. 


tionary  models  and  understanding  genetic 
transduction  within  the  environment. 

2.  Overview  of  marine  phage  ecology 

2.1.  Phage  effects  on  carbon  flow 

The  influences  of  phages  on  ecosystem  dynam¬ 
ics  are  best  understood  in  the  marine  environment. 
The  marine  microbial  food  web  (MMFW)  is  the 
consortium  of  heterotrophic  and  autotrophic  pro¬ 
karyotes,  as  well  as  their  predators,  that  inhabit 
the  Earth’s  oceans  and  seas  (Azam,  1998).  The 
MMFW  regulates  the  transfer  of  energy  and  nutri¬ 
ents  to  higher  trophic  levels  and  greatly  influences 
global  carbon  (C)  and  nutrient  cycles  (Pomeroy, 
1974;  Azam  et  al.,  1983).  Dissolved  organic  matter 
(DOM)  is  the  largest  biogenic  sink  of  carbon  in 
the  ocean  (Kennish,  2001).  Because  the  DOM 
pool  is  so  large,  heterotrophic  bacterial  populations 
are  not  resource  limited;  instead,  they  are  con¬ 
trolled  by  predation  (Fuhrman  and  Noble,  1995). 


The  two  predator  guilds  responsible  for  top-down 
control  of  the  MMFW  are  the  protozoa  and  phages 
(Fuhrman  and  Noble,  1995).  In  near-shore  waters 
each  of  these  predator  guilds  accounts  for  50%  of 
the  microbial  mortality  each  day  (Fuhrman  and 
Noble,  1995).  To  put  the  effects  of  these  two 
bacterial  predator  guilds  into  perspective,  ~49.3 
Gt  of  C  is  fixed  by  phytoplankton  per  year  in  the 
world’s  oceans  (Field  et  al.,  1998),  while  global 
marine  bacterial  production  is  estimated  to  be  26- 
70  Gt  of  C  per  year  (Wilhelm  and  Suttle,  1999). 
Therefore,  the  majority  of  the  marine-biotic  C  is 
cycled  into  microbes  and  most  of  these  microbes 
are  killed  by  protozoa  and  phage  predators. 

When  bacteria  are  eaten  by  protozoa,  there  is  a 
possibility  that  the  carbon  can  be  transferred  to 
the  larger  members  of  the  marine  food  web  (Fig. 
1;  Fukami  et  al.,  1999).  In  contrast,  when  a 
bacterium  is  killed  by  a  lytic  phage,  both  the  lysed 
host  cell  and  the  phage  become  part  of  the  DOM 
pool  (Middelboe  et  al.,  1992,  19%).  Since  DOM 
is  only  utilized  by  other  heterotrophic  bacteria  that 
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are  also  susceptible  to  lytic  phages,  this  carbon 
never  leaves  the  MMFW.  The  more  rapidly  this 
cycle  repeats  itself,  the  greater  the  amount  of 
respiratory  C02  that  is  produced,  leaving  less 
organic  carbon  stored  in  the  world’s  oceans  (Fig. 
1).  Thus  phage  activity  may  prove  troublesome  to 
proposed  efforts  to  fertilize  the  oceans  for 
increased  carbon  sequestration  and  fisheries  yields. 

In  the  open  ocean,  Prochlorococcus  and  Syne - 
chococcus  are  the  numerically  dominant  autotrophs 
(reviewed  in  Waterbury  et  al.,  1986;  Partensky  et 
al.,  1999).  Phages  that  infect  these  important 
primary  producers  have  been  isolated  (Suttle, 
1993;  Suttle  and  Chan,  1993;  Waterbury  and 
Valois,  1993;  Wilson  et  al.,  1993;  Sullivan,  2001). 
Although  ecological  studies  of  the  impact  of 
cyanophage  on  Synechococcus  communities  sug¬ 
gest  that  direct  mortality  is  low  relative  to  that 
observed  for  heterotrophic  bacteria  (reviewed  in 
Suttle,  2000),  the  impact  of  cyanophage  pressures 
on  population  structure  and  diversity  of  these 
systems  may  be  significant  (Waterbury  and  Valois, 
1993).  In  near-shore  communities,  viruses  have 
been  isolated  that  infect  the  major  ‘large’  phyto¬ 
plankton  species  (e.g.  eukaryotic  diatoms  and 
dinoflagellates).  While  viruses  are  not  thought  to 
be  the  major  agent  of  cyanobacterial  mortality, 
virus-induced  mortality  may  be  responsible  for  the 
‘sudden  crashes'  that  terminate  many  blooms  of 
eukaryotic  algae  (Sieburth  et  al.,  1988;  Bratbak, 
1993;  Nagasaki  et  ah,  1993;  Bratbak  et  ah,  1995, 
1998). 

2.2.  Transduction  in  marine  environments 

Besides  their  enormous  influence  on  marine 
biogeochemistry,  phages  have  important  effects  on 
genetic  exchange  in  the  marine  environment  (Jiang 
and  Paul,  1998a).  Phages  can  mediate  DNA 
exchange  between  different  bacteria  by  transduc¬ 
tion,  which  occurs  when  host  DNA  is  accidentally 
packaged  into  the  phage  during  assembly  (Masters, 
1996;  Weisberg,  1996).  When  the  mispackaged 
phage  infects  another  bacterium,  instead  of  inject¬ 
ing  phage  DNA,  it  transfers  DNA  from  its  former 
host.  Jiang  and  Paul  (1998a)  estimated  that 
1.3  X1014  transduction  events  occur  per  year  in 
the  Tampa  Bay  Estuary,  Florida.  Extrapolation 
suggests  that  marine  phages  transduce  1028  base 
pairs  of  DNA  per  year  in  the  world’s  oceans. 


2.3.  Lysogeny  in  marine  environments 

Not  all  phage-host  encounters  lead  to  host  cell 
lysis,  many  rather  result  in  lysogeny  or  pseudoly¬ 
sogeny  (Ackermann  and  DuBow,  1987).  Through 
meticulous  work  at  the  single  cell  level  with 
Bacillus ,  lysogeny  was  first  described  as  ‘the 
hereditary  power  to  produce  bacteriophage’ 
(Lwoff,  1953).  This  ‘hereditary  power’  is  due  to 
the  integration  of  invading  phage  DNA  into  the 
host  cell  genome  (now  termed  a  prophage)  rather 
than  proceeding  through  the  lytic  pathway.  The 
prophage  will  remain  integrated  in  the  host  cell 
genome  until  it  is  induced  to  ‘abandon  ship’  and 
proceed  through  the  lytic  pathway.  The  molecular 
mechanism  underlying  prophage  integration  and 
excision  are  well  understood  in  model  systems 
(Hershey,  1971).  In  contrast,  pseudolysogeny  is  a 
poorly  understood  phenomenon.  It  is  often  invoked 
to  describe  conditions  where  constant  phage  pro¬ 
duction  occurs  in  the  presence  of  a  high  abundance 
of  host  cells,  thus  allowing  large  numbers  of  host 
cells  and  their  phage  to  coexist.  Two  mechanisms 
that  might  explain  such  observations  are  the  fol¬ 
lowing;  (1)  a  mixture  of  sensitive  and  resistant 
host  cells;  or  (2)  a  mixture  of  temperate  and 
virulent  phages  (Williamson  et  al.,  2001). 

Lysogeny  has  been  shown  to  improve  the  gen¬ 
eral  fitness  of  the  host  (Edlin  et  al.,  1975),  largely 
from  lysogenic  conversion,  or  the  expression  of 
prophage-encoded  genes.  A  common  lysogenic 
conversion  phenotype  is  immunity  to  superinfec¬ 
tion  (Hershey,  1971),  but  lysogenic  conversion  can 
also  result  in  altered  structural  characteristics 
(Pruzzo  and  Satta,  1988;  Vaca-Pacheco  et  al., 
1999;  Mirold  et  al.,  2001),  as  well  as  resistance  to 
antibiotics  (Mlynarczyk  et  al.,  1997)  and  reactive 
oxygen  species  (Figueroa-Bossi  and  Bossi,  1999). 
Of  particular  importance  and  global  significance 
is  the  spread  of  toxin/virulence  genes  (often 
termed  ‘pathogenicity  islands’)  by  lysogenic  con¬ 
version.  Diphtheria,  botulinum,  cholera,  pertussis, 
shiga  and  many  other  exotoxins  are  prophage 
encoded  (reviewed  in  Davis  and  Waldor,  2002). 
Recent  evidence  suggests  that  these  toxin  genes 
can  be  transferred  by  transduction  and  other  lateral 
gene  transfer  mechanisms  (Boyd  and  Waldor, 
1999;  Faruque  et  al.,  1999;  Yaron  et  al.,  2000). 

Lysogeny  is  a  common  phenomenon  in  the 
marine  environment.  A  recent  study  suggests  that 
lysogeny  in  oligotrophic  waters  is  common 
amongst  cultivated  bacteria,  as  40%  of  1 10  marine 
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bacterial  isolates  produced  phage  or  bacteriocin- 
like  particles  upon  treatment  with  an  inducing 
agent  (Jiang  and  Paul,  1998b).  Efforts  to  quantify 
lysogeny  in  natural  populations  have  resulted  in  a 
wide  range  of  values  for  the  proportion  of  the 
population  lysogenized.  For  example,  Weinbauer 
and  Suttle  (1996)  found  that  in  the  Gulf  of 
Mexico,  1.5-11.4%  of  the  microbial  population 
was  lysogenized,  whereas  a  detailed  seasonal  study 
of  lysogeny  in  Tampa  Bay  indicated  that  the 
lysogenic  fraction  could  range  from  0  to  100% 
(average  27.6 ±37.1%;  Williamson  et  al.,  2002). 
During  the  seasonal  study,  lysogeny  was  primarily 
detected  in  winter  months,  consistent  with  the 
theory  that  lysogeny  is  favored  in  times  of  low 
host  cell  density.  The  environmental  factors  that 
lead  to  the  control  of  lysogeny  in  the  marine 
environment  are  largely  unknown  although  links 
to  nutrients  such  as  phosphate  have  been  suggested 
(Tuomi  et  al.,  1995;  Wilson  et  al.,  19%,  1997). 
The  molecular  control  of  lysogeny  in  many  phage 
host  systems  is  complex  (Ptashne,  1992;  Friedman 
and  Court,  2001)  and  usually  involves  genomic 
elements  termed  lysogeny  modules  (Lucchini  et 
al.,  1999).  Essentially,  nothing  is  known  of  the 
molecular  or  environmental  control  of  these 
genomic  elements  in  marine  phages.  In  addition 
to  investigations  of  bacterial  lysogens  in  the  envi¬ 
ronment,  two  recent  studies  have  suggested  that 
natural  populations  of  the  cyanobacterium  Syne - 
chococcus  can  be  lysogenized  (McDaniel  et  al., 
2002;  Ortmann  et  al.,  2002). 

2.4.  Microbial  and  phage  diversity  in  the  marine 
environment 

Through  their  role  as  species-specific  predators, 
phages  may  also  help  maintain  microbial  diversity. 
In  the  absence  of  host  cell  resistance  and  providing 
that  contact  rates  remain  high,  lytic  phages  could 
potentially  lyse  all  individuals  of  a  species— thus 
phage  attack  can  result  in  a  rapid  succession  of 
microbial  species  (Thingstad  and  Lignell,  1997; 
Wommack  and  Colwell,  2000).  Experimental  evi¬ 
dence  that  phage  exert  a  strong  selective  pressure 
on  microbial  populations  comes  from  host-range 
analysis  of  phage  isolates  and  the  observation  that 
very  closely  related  bacterial  species  and  even 
strains  of  the  same  species  are  infected  by  different 
phages  (Moebus,  1991;  Suttle  and  Chan,  1993, 
1994;  Waterbury  and  Valois,  1993). 


The  biodiversity  of  marine  phage  is  essentially 
unknown.  The  few  studies  that  have  addressed  this 
question  suggest  that  diversity  is  high  (reviewed 
in  Borsheim,  1993).  Moebus  and  colleagues  (Moe¬ 
bus,  1991,  1992a,b  Moebus  and  Nattkemper,  1991) 
screened  over  900  isolates  of  culturable  marine 
bacteria  and  found  that  approximately  one-third 
were  susceptible  to  at  least  one,  and  often  multiple, 
lytic  phages.  Over  the  course  of  these  studies,  the 
authors  concluded  that:  (1)  the  majority  of  bacte¬ 
rial  strains  were  probably  susceptible  to  phage 
infection;  and  (2)  the  phages  isolated  in  these 
studies  were  specific  to  single  hosts.  Further  work 
characterized  a  subset  of  these  phages  using 
DNA-DNA  hybridizations,  %GC,  genome  size 
estimations  and  host-range  analysis  and  showed 
that  they  were  genetically  diverse  (Wichels  et  al., 
1998).  Kellogg  et  al.  (1995)  isolated  60  phages 
from  Florida  and  Hawaii  that  infected  Vibrio  par- 
ahaemolyticus  and  analyzed  them  by  restriction 
fragment  length  polymorphism  (RFLP),  host  spec¬ 
ificity  and  Southern  blotting.  RFLP  analysis  sepa¬ 
rated  the  60  phage  isolates  into  six  distinct  groups 
that  were  then  further  genetically  characterized 
through  Southern  blotting  using  a  1.5-kb  DNA 
probe  cloned  from  one  of  the  isolates.  Host  range 
analysis  showed  that  these  phages  were  host- 
specific  as  none  of  the  60  isolates  were  able  to 
infect  closely  related  Vibrio  species.  Together  these 
findings  strongly  suggest  that  phage  diversity  is  at 
least  as  high,  and  probably  higher,  than  the  diver¬ 
sity  of  bacteria.  That  is,  for  each  bacterium  there 
is  probably  at  least  one,  and  often  multiple,  phages 
capable  of  infecting  it  and  each  phage  is  usually 
specific  to  a  single  microbial  species  or  strain. 

3.  The  current  state  of  marine  phage  genomics 

3.1.  Cultured  marine  phage  genomes 

The  first  marine  phage  genome  to  be  completely 
sequenced,  Pseudoaltermonas  espejiana  BAL-31 
phi  PM2,  was  isolated  off  the  coast  of  Chile  in 
the  l%0s  (Espejo  and  Canelo,  1968).  Phage  PM2 
was  also  the  first  lipid-containing  phage  ever 
isolated  and  it  serves  as  the  type  phage  for  the 
International  Committee  of  Viral  Taxonomy 
(1CTV)  family  Corticoviridae  (Murphy  et  al., 
1995).  The  genome  of  phi  PM2  is  circular  and 
10  079  bp  long  that  is  replicated  in  a  rolling  circle 
fashion  (Mannisto  et  al.,  1999).  The  genes  that 
encode  structural  and  replication  proteins  have 
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Bacteria 


Fig.  2.  Phylogenetic  trees  of  DNA  polymerase  genes  found  in  bacteria,  Archaea  and  various  phages  of  marine  and  terrestrial  origin. 
Marine  phage  DNA  polymerase  sequences  include  V.  parahaemolyticus  <J>VpV262,  <J>VP16C  and  <t>VP16T;  Roseobacter  SI067  <J>S10 1 
and  Synechococcus  WH78G3  <t>P60. 


been  identified  amongst  the  42  potential  open 
reading  frames  (ORFs)  (Mannisto  et  al.,  1999). 
The  phi  PM2  genome  also  contains  a  plasmid-like 
maintenance  region,  suggesting  that  the  genome 
may  be  transferred  among  bacteria  either  as  a 
plasmid  or  as  a  free  phage  (Mannisto  et  al.,  1999). 
Essentially  nothing  is  known  about  the  ecology  of 
phi  PM2. 

The  second  marine  phage  genome  to  be 
sequenced,  Roseobacter  SI067  phi  SIOl,  was 
isolated  off  the  Scripps  Institution  of  Oceanogra¬ 
phy  pier  in  1990  (Rohwer  et  al.,  2000).  Phage 
SIOl  has  a  39  906  bp  dsDNA  genome  with  34 
predicted  ORFs  (Rohwer  et  al.,  2000).  The  phi 
SIOl  primase/helicase,  DNA  polymerase  and 
endodeoxyribonuclease  1  share  significant  similar¬ 
ity  to  presumed  homologs  in  Escherichia  coli  phi 
T3  and  T7  (Rohwer  et  al.,  2000).  The  Synecho¬ 
coccus  WH7803  phi  P60  genome  (Chen  and  Lu, 
in  press),  discussed  below  also  contains  ORFs 
with  significant  BLAST  hits  to  these  genes.  This 
suggests  that  there  is  a  group  of  marine  phage  that 


uses  a  DNA  replication  mechanism  much  like  phi 
T3  and  T7  (Fig.  2),  The  ORFs  that  probably 
encode  the  phi  SIOl  structural  proteins,  as  sug¬ 
gested  by  their  position  in  the  genome,  do  not 
share  significant  similarity  to  other  phage  proteins 
currently  in  GenBank,  but  may  be  related  to 
sequences  found  in  the  marine  phages  Vibrio 
parahaemolyticus  phi  VpV  262  (Hardies  et  al., 
2002).  The  phi  SIOl  genome  also  contains  a 
number  of  tantalizing  hints  about  its  ecology.  In 
particular,  it  appears  the  phage  has  linked  its 
lifecycle  to  the  phosphate  metabolism  of  its  host 
cell  (see  below). 

The  first  cyanophage  genome  to  be  completely 
sequenced,  Synechococcus  WH7803  phi  P60,  has 
a  47  872  bp  dsDNA  genome  that  contains  80 
potential  ORFs  (Chen  and  Lu,  in  press).  The  DNA 
replication  genes  are  related  to  those  of  phi  SIOl 
and  phi  T7.  Interestingly,  the  phi  P60  DNA  poly¬ 
merase  gene  appears  to  be  more  closely  related  to 
those  encoded  by  the  non-marine  phage  T3,  T7 
and  phi-Ye03-12  than  to  the  marine  phi  SIOl 
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Table  1 

The  cultured  marine  phage  that  have  completed  genome  sequences 


Phage 

Genome  Size  (kb) 
and  morphology 

Genome  access  and  refs. 

Pseudoalteromonas 

10.1 

NC  000867 

espejiana  phi  PM2 

Corticovirus 

(Mannisto  et  al.  1999) 

Roseobacter  SI067  phi 

39.9 

NC  002519 

SIOl 

Podovirus 

(Rohwer  et  al.  2000) 

Synechococcus  WH78G3 

47.8 

AF338467 

phi  P60 

Podovirus 

(Chen  and  Lu,  in  press) 

Vibrio  parahaemolyticus 

49.7 

phi  TB16T 

Myovirus 

Vibrio  parahaemolyticus 

47.5 

phi  TB16C 

Myovirus 

Vibrio  parahaemolyticus 

45.9 

http://biochem.uthscsa.edu/-hs_lab/phage.html 

phi  VpV  262 

Podovirus 

(Hardies  et  al.,  2002) 

Currently  there  are  only  three  marine  phage  genomes  available  in  GenBank.  Three  other  genomes  have  been  sequenced  and  are 
currently  being  annotated. 


gene  (Chen  and  Lu,  in  press).  Phage  P60  and  phi 
SIOl  also  encode  a  ribonucleotide  reductase  that 
is  probably  involved  in  recycling  host  rNTPs  to 
dNTPs  that  can  be  incorporated  into  nascent  phage 
genomes.  Phage  P60  encodes  a  RNA  polymerase, 
which  appears  to  be  absent  from  the  phi  SIOl 
genome.  Since  the  RNA  polymerase  is  essential  to 
invasion  and  transcription  of  T7  (Calendar,  1988), 
phi  P60  and  phi  SIOl  must  have  very  different 
lifecycles. 

Three  marine  phages  that  infect  two  strains  of 
the  human  pathogen  Vibrio  parahaemolyticus  have 
also  been  completely  sequenced  (Table  1).  Vibrio 
parahaemolyticus  phi  VpV  262  is  a  Podovirus 
isolated  from  the  Strait  of  Georgia,  BC,  Canada 
(Hardies  and  Serwer,  2002).  The  genome  is  45  874 
bp  linear  dsDNA  genome  with  73  predicted  ORFs. 
The  genome  contains  DNA  polymerase,  primase 
and  helicase  genes  that  appear  to  be  more  closely 
related  to  their  bacterial  homologues  than  to  the 
phi  T7  or  phi  SIOl  genes. 

Two  other  Vibriophages,  phi  TB16T  and  phi 
TB16C,  were  separated  from  each  other  from  a 
lysate  of  phage  VP16,  isolated  from  Tampa  Bay 
(Kellogg  et  al.,  1995).  Although  the  original  phage 
was  thought  to  be  a  myovirus  based  on  its  con¬ 
tractile  tail  (Kellogg  et  al.,  1995),  the  DNA 
sequence  similarities  and  presence  of  a  cos  site 
suggest  that  the  viruses  are  more  likely  to  be 
siphoviruses  (Segall  and  Rohwer,  unpublished 
data).  The  two  viruses  are  closely  related  (73- 
91%)  over  roughly  80%  of  their  genomes,  but 
differ  by  multiple  unique  insertions  ranging 
between  approximately  200  and  5000  bp  in  size. 


Each  virus  encodes  63-64  ORFs.  The  structural 
genes  are  clustered  on  the  left  side  of  the  genome 
near  the  cos  site  and  include  a  gene  related  to 
phage  lambda's  large  terminase  subunit,  as  well  as 
genes  related  to  those  encoding  tail  and  tail  fiber 
proteins,  portal,  sheath  and  tape  measure  proteins 
of  various  phages  and  prophages.  The  rest  of  the 
genome  includes  ORFs  similar  to  a  DNA  poly¬ 
merase,  a  helicase,  a  polypeptide  deformylase,  and 
two  genes  weakly  similar  to  transcriptional  regu¬ 
lators.  The  two  Vibriophages  also  contain  numer¬ 
ous  other  genes  similar  to  ORFs  encoding 
hypothetical  proteins  from  other  phage,  prophage 
and  bacterial  genomes.  Although  the  original 
phage  VP16  gave  clear  plaques,  TB16T  and 
TB16C  were  separated  from  each  other  based  on 
their  plaque  morphology— TB16T  gave  turbid 
plaques  with  clear  centers,  whereas  TB16C  gave 
entirely  clear  plaques.  Although  lysogeny  has  not 
been  proven,  there  is  significant  evidence  from  the 
genome  sequence  predicting  that  these  phages  have 
a  temperate  lifestyle— including  the  putative  reg¬ 
ulatory  elements  and  the  ability  to  circularize.  Of 
great  interest  is  that  the  sequences  of  the  mixture 
of  two  genomes  formed  separate  conti gs  during 
sequencing  and  assembly,  despite  substantial 
regions  of  identity  between  the  two  genomes 
(Rohwer  and  Segall,  unpublished  data).  This  sug¬ 
gests  that  even  very  closely  related  viruses  can  be 
sequenced  from  a  mixed  lysate,  even  when,  as 
indicated  by  restriction  digests,  one  genome  makes 
up  less  than  5%  of  the  DNA  in  the  lysate. 
Subsequent  sequencing  of  libraries  made  of  DNA 
from  the  isolated  phages  showed  that  we  did  not 
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obtain  any  chimeric  sequences  between  the  two 
genomes. 

As  of  this  writing,  the  genomic  sequencing  of 
cyanophage  S-PM2  is  being  finished  (Nicholas  H. 
Mann,  personal  communication).  The  genome  is 
~  170  kb  and  includes  several  very  large  genes, 
including  several  that  encode  proteins  >  3000  ami¬ 
no  acids  in  length.  There  are  some  rather  surprising 
genes  such  as  one  that  is  homologous  to  a  gene 
involved  in  complementary  chromatic  adaptation. 
When  finished,  this  genome  should  be  particularly 
exciting  because  it  will  be  the  first  marine  repre¬ 
sentative  of  the  Myoviruses  related  to  coliphage 
T4. 

In  addition  to  marine  phages,  a  number  of 
groups  are  starting  to  sequence  marine  viruses. 
Emiliania  huxleyi  is  a  marine  coccolithophorid, 
with  a  world-wide  distribution  that  forms  vast 
coastal  and  mid-oceanic  blooms  (Holligan  et  al., 
1993).  Double  stranded  DNA  viruses  that  infect 
E.  huxleyi  (EhV)  have  recently  been  isolated 
(Wilson  et  al.,  in  press).  Phylogenetic  analysis  of 
DNA  polymerase  gene  fragments  of  these  viruses 
suggests  that  EhVs  belong  to  a  new  genus  with 
the  proposed  name  Coccolithouirus ,  within  the 
family  of  algal  viruses  Phycodnaviridae  (Schroeder 
et  al.,  in  press).  Work  is  currently  underway 
(shotgun  sequencing  completed,  finishing  begun) 
to  sequence  the  410-kb  genome  of  one  of  these 
viruses  with  a  completion  date  due  in  summer 
2002  (William  Wilson,  personal  communication). 
Tai  et  al.  (2002)  have  also  reported  the  complete 
sequence  of  a  virus  associated  with  lysis  of  the 
eukaryotic  fish-killer  Heterosigma  akashiwo. 

3.2.  Prophage  in  completed  marine  bacterial 
genomes 

In  the  near  future,  prophage  contained  in  marine 
bacterial  genomes  will  be  an  important  data  source 
when  studying  marine  phage.  The  best  studied 
prophage  system  in  a  marine  bacterial  genome  is 
CTX-phi,  the  lysogenic  phage  that  encodes  the 
cholera  toxin  in  pathogenic  strains  of  Vibrio  chol- 
erae  (Waldor  and  Mekalanos,  1996).  CTX-phi  is 
a  temperate,  filamentous  phage  that  is  secreted  by 
the  same  extracellular  protein  secretion  system 
used  for  cholera  toxin  production  (Davis  et  al., 
2000a).  The  phage  uses  a  type  IV  pilus  as  a 
receptor  that  is  encoded  by  an  adjacent  genetic 
element,  the  TCP  gene  cluster,  itself  a  putative 
prophage  (Karaolis  et  al.,  1999).  Three  strains  of 


CTX-phi  have  been  characterized  from  classical, 
El  Tor  and  Calcutta  isolates  of  V.  cholerae.  Inter¬ 
estingly,  the  former  strain  encodes  functional  phage 
genes  that  are  expressed,  but  lacks  the  ability  to 
produce  infectious  virions  due  to  their  occurrence 
as  single  prophage  elements  (Davis  et  al.,  2000b). 
In  contrast,  the  latter  two  phage  strains  occur  in 
tandem  arrays  of  multiple  prophages  or  single 
prophage  plus  prophage  parts  and  these  strains  are 
capable  of  generating  infectious  virions  (Davis  and 
Waldor,  2000).  The  increasing  numbers  of  publicly 
available  microbial  genomes  have  led  to  the  dis¬ 
covery  that  prophage  elements  may  exist  in  nearly 
every  microbial  genome  (S.  Casjens;  personal 
communication).  Putative  prophage  have  been 
identified  by  inspecting  regions  of  microbial  chro¬ 
mosomes  for  the  following  characteristics:  (1) 
genes  possessing  homology  to  known  phage  genes; 
(2)  a  contiguous  group  of  genes  containing  few,  if 
any,  obviously  non-phage  genes;  (3)  the  phage 
genes  organized  in  a  phage-like  manner  (e.g. 
integrase  near  the  end,  structural  genes  clustered, 
correctly  ordered,  etc.);  and  (4)  ‘unknown’  genes 
in  the  putative  prophage  not  obviously  organized 
in  a  non-phage-like  manner.  Using  this  approach, 
we  have  identified  putative  prophages  in  the  com¬ 
plete  marine  genomes  of  Prochlorococcus  MED4. 
Prochlorococcus  MIT  9313  and  Synechococcus 
WH  8102  (http://www.jgi.doe.gov/).  However 
these  regions  were  later  shown  not  to  be  prophage 
regions  as  the  phage  genes  were  too  few,  were  not 
organized  in  a  phage-like  manner  and  were  inter- 
upted  by  obvious  non-phage  genes. 

There  is  significant  evidence  that  some  pro¬ 
phages  integrate  into  select  regions  of  a  host  cell 
genome.  Thirty-four  of  58  cases  of  the  integration 
of  genetic  elements  (including  prophage)  were 
found  to  occur  in  attB  sites  within  tRNA  or 
tmRNA  genes  (Williams,  2002).  Prophage  integra¬ 
tion  at  these  sites  might  be  beneficial  because:  (1) 
tRNA  and  tmRNA  genes  have  ~four-  to  ninefold 
lower  mutation  rates  than  other  protein  encoding 
regions;  and  (2)  these  genes  are  small  thus  requir¬ 
ing  a  smaller  region  to  be  mimicked  by  the  phage 
attP.  Of  particular  interest  to  phage  ecologists  is 
the  fact  that  tRNA  promoters  are  known  to  be 
regulated  by  growth  rate  (Swenson  et  al.,  1994)  — 
in  effect,  allowing  a  prophage  integrating  into  a 
tRNA  gene  to  monitor  the  physiological  state  of 
its  host  through  transcriptional  coupling  to  the 
tRNA  gene.  It  is  unknown  how  integrase  genes  of 
genetic  elements  are  able  to  recognize  tRNA-like 
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elements,  but  Williams  (2002)  suggests  that  the 
definitive  secondary  structure  of  the  DNA  mole¬ 
cule  may  be  involved. 

To  further  aid  in  the  identification  of  prophage 
in  microbial  genomes,  we  assessed  the  usefulness 
of  DNA  Structural  Atlases  (http:// 
www.cbs.dtu.dk/services/GenomeAtlas/;  Peder¬ 
sen  et  al.,  2000)  as  a  diagnostic  tool.  These  atlases 
display  DNA  structural  characters,  such  as  DNA 
curvature,  DNA  flexibility  and  DNA  stability,  in 
the  form  of  a  color-coded  wheel  that  is  useful  for 
visually  revealing  interesting  structural  features  of 
a  genomic  sequence  (Pedersen  et  al.,  2000).  The 
five  models  used  to  predict  these  structural  char¬ 
acters  are  based  on  either  empirical  data  (e.g. 
DNase  I  sensitivity.  X-ray  crystallography  data, 
trinucleotide  preferences  and  gel  mobility)  or 
quantum  mechanical  calculations.  To  evaluate  this 
tool  for  predicting  prophage,  we  examined  the 
DNA  structural  atlases  of  genomes  containing 
well -characterized  prophage  elements  to  qualita¬ 
tively  look  for  diagnostic  trends.  Although  there 
often  appears  to  be  significant  structure  within  the 
prophage  regions,  this  structure  is  not  associated 
with  all  known  prophage  and  often  occurs  through¬ 
out  the  genome  in  known  non-prophage  regions. 
However,  factors  such  as  the  type  of  prophage  and 
the  length  of  time  since  its  last  activity  might 
greatly  affect  these  DNA  structural  properties  and 
require  a  more  intensive,  quantitative  analysis  to 
identify  diagnostic  trends. 

3.3.  Uncultured  marine  phage  genomes 

Based  upon  pulsed-field  gel  electrophoresis  of 
natural  marine  phage  communities,  we  know  that 
marine  viral  genomes  fall  into  three  size  ranges: 
35-40,  50-65  and  120-140  kb  (Wommack  et  al., 
1999;  Steward  et  al.,  2000).  It  has  long  been 
known  that  there  is  little  similarity  between  the 
marine  bacteria  that  have  been  cultivated  and  those 
phylotypes  that  are  known  to  be  prevalent  as 
determined  by  16S  rDNA  analyses  (Fuhrman  and 
Campbell,  1998).  Therefore,  cultured  marine 
phage  will  most  probably  not  be  representative  of 
the  community.  To  circumvent  the  limitations 
imposed  by  culturing,  a  number  of  laboratories 
have  started  to  sequence  the  genomes  of  total 
marine  viral  communities. 

Breitbart  et  al.  (in  press)  have  constructed  a 
shotgun  library  from  an  uncultured,  near-shore 
marine  viral  community.  Only  30%  of  the  sequenc¬ 


es  from  this  library  possessed  appreciable  similar¬ 
ity  to  those  in  the  GenBank.  Of  the  significant 
hits,  32%  were  phage  in  origin  and  3%  were  most 
closely  related  to  eukaryotic  viruses.  Among  the 
phage  genes  showing  similarity,  Podovirus  genes 
were  most  common  (43%).  Representatives  of  the 
Sipho-  and  Myoviridae  were  also  found.  These 
broad  trends  have  now  been  observed  in  a  second, 
near  shore  library  (Breitbart  and  Rohwer,  unpub¬ 
lished  data).  Another  research  group  has  partially 
sequenced  a  shotgun  library  made  from  a  phage 
community  from  70  m  in  Monterey  Bay  (Steward 
and  Preston,  unpublished  data).  As  with  the  near¬ 
shore  phage  community,  most  of  the  sequences 
show  no  similarity  to  Genbank  entries.  Both  librar¬ 
ies  contained  sequences  with  significant  similarity 
to  phage  DNA  polymerase  genes,  RNA  polymer¬ 
ase,  integrases,  transposases  and  reverse 
transcriptases. 

Breitbart  et  al.  (in  press)  proposed  a  mathemat¬ 
ical  model  that  uses  the  number  of  observed 
contigs  to  predict  phage  richness  and  diversity  in 
the  sample.  According  to  their  calculations,  the 
most  abundant  phage  in  the  sample  made  up  4% 
of  the  population  and  phage  diversity  is  very  high 
(e.g.  Shannon- Weaver  index  value  of  7-8;  Shan¬ 
non  and  Weaver,  1963).  This  model  also  predicts 
that  it  is  technically  possible  to  sequence  an  entire 
marine  viral  community. 

4.  Uses  of  marine  phage  genomes 

4.1.  Classification  of  marine  phages 

A  major  goal  of  phage  genomic  sequencing 
projects  should  be  to  provide  the  information 
necessary  to  classify  marine  phage  into  guilds  that 
reflect  their  biology.  Current  phage  taxonomy 
relies  on  the  morphological  characteristics  of  the 
free  phage  particle  as  established  by  the  Interna¬ 
tional  Committee  on  Taxonomy  of  Viruses  (ICTV) 
(Murphy  et  al.,  1995).  The  ICTV  classification, 
however,  provides  very  little  information  about  the 
ecological  niches  or  lifestyles  of  phage.  Addition¬ 
ally,  the  ICTV  system  does  not  have  sufficient 
resolution  to  address  phage  biodiversity  questions, 
nor  will  it  be  useful  for  analyzing  uncultured 
marine  phage  or  prophage  genomes.  In  response 
to  these  shortcomings,  numerous  groups  are  active¬ 
ly  constructing  phage  taxonomical  systems  based 
on  completed  genomic  sequences  (Lawrence  et 
al.,  2002;  Rohwer  and  Edwards,  2002).  These 
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systems  will  help  classify  marine  phages  into 
families  that  provide  information  about  their  life- 
cycles  and  ecological  roles,  as  well  as  identify 
phage  types  that  deserve  more  detailed  analyses. 

Marine  phage  genomes  are  already  helping  to 
differentiate  phages  into  operational  taxonomic 
units  (OTUs)  that  predict  biological  properties. 
For  example,  total  genome  analyses  and  individual 
DNA  polymerase  sequences  show  that  Podoviruses 
belong  to  two  OTUs  with  fundamentally  different 
DNA  replication  mechanisms  (Pecenkova  and  Pac¬ 
es,  1999;  Chen  and  Lu,  in  press,  Rohwer  and 
Edwards,  2002).  The  first  group,  the  phi  T7-like 
Podophages,  includes:  phi  T7;  phi  T3;  phi  P60; 
phi  SlOl;  and  Yersinia  enterocolitica  phi  Ye03-12 
(Fig.  2;  Rohwer  and  Edwards,  2002).  This  group 
of  T7-like  phages  replicates  their  genomes  by  a 
primase/DNA  polymerase  mechanism  (Acker- 
mann  and  DuBow,  1987).  A  second  group,  the 
PZA-like  Podophage  includes:  phi  PRD1;  Bacillus 
subtilis  phi  PZA;  B.  subtilis  phi  GA-1;  B.  subtilis 
phi  B103;  Streptococcus  pneumoniae  phi  Cp-1; 
and  Mycoplasma  sp  phi  PI  (Rohwer  and  Edwards, 
2002).  The  PZA-like  Podophage  replicate  their 
DNA  using  a  covalently  linked  5'  terminal  protein 
primer  (Salas,  1991). 

Completed  phage  genomes  are  also  beginning 
to  help  identify  conserved  sequences  to  facilitate 
studies  of  phage  evolutionary  history,  biodiversity 
and  biogeography.  Rohwer  and  Edwards  (2002) 
have  suggested  that  conserved  sequences  within 
phage  groups  be  called  ‘signature  genes’.  It  should 
be  noted  that  as  is  often  the  case  with  studies  of 
natural  diversity,  a  growing  database  of  more 
sequences  and  genomes  will  greatly  facilitate  the 
identification  of  novel  phage  taxa  and  signature 
genes. 

4.2.  Evolution  of  marine  phage 

Genomic  data  enable  determination  of  evolu¬ 
tionary  relationships  between  marine  and  non- 
marine  phages  (Fuller  et  al.,  1998;  Rohwer  et  al., 
2000;  Hambly  et  al.,  2001).  Additionally,  since 
the  marine  environment  probably  represents  the 
largest  and  oldest  biosphere  on  the  planet,  vital 
clues  to  the  origin  of  phages  may  reside  in  the 
sequences  of  marine  phage.  Fuller  et  al.  (1998) 
first  proposed  specific  evolutionary  relationships 
between  marine  and  non-marine  phages  by 
sequencing  regions  of  structural  proteins  in  Escher- 
icia  coli  phi  T4-like  phage.  Hambly  et  al.  (2001) 


has  also  sequenced  the  entire  region  homologous 
to  gpl8-23  in  phi  T4-like  phage  and  showed  that 
it  is  conserved  in  the  marine  phage  Synechococcus 
cyanophage  S-PM2.  The  podophage  phi  SIOl  and 
phi  P60,  as  well  as  data  from  the  uncultured  phage 
libraries,  suggest  that  there  is  a  large  group  of 
marine  phage  encoding  DNA  replication  machin¬ 
ery  closely  related  to  that  of  T7  (Fig.  2;  Rohwer 
et  al.,  2000;  Rohwer  and  Edwards,  2002;  Chen 
and  Lu,  in  press).  Breitbart  et  al.  (in  press)  have 
proposed  that  this  group  of  phages  is  numerically 
dominant  in  the  world’s  oceans. 

Using  analyses  similar  to  those  employed  for 
enteric  and  dairy  phages.  Hardies  et  al.  (2002) 
proposed  that  the  phi  VpV262  genome  contains 
an  identifiable  moron  (‘more  DNA’;  Hendrix  et 
al.,  2000;  Juhala  et  al.,  2000;  Hardies  et  al.,  2002). 
By  comparing  codon  usage  preferences,  these 
researchers  have  suggested  that  phi  VpV262  genes 
are  in  equilibrium  with  each  other,  but  not  with 
the  host  (Hardies  et  al.,  2002).  This  observation 
suggests  that  this  phage  may  have  a  broader  host 
range  than  expected,  extending  beyond  V.  para- 
haemolyticus.  This  type  of  analysis  will  be  useful 
for  determining  phylogenetic  relationships  among 
marine  and  non-marine  phage  (Blaisdell  et  al„ 
1996). 

4.3.  Biogeography  of  marine  phages 

Genomic  sequence  information  enables  the  con¬ 
struction  of  primers  and  probes  to  detect  specific 
phage  in  the  environment.  The  Roseobacter  SI067 
phi  SIOl  genomic  sequence,  for  example,  was 
used  to  design  specific  primers  that  could  detect 
~  10  phage  in  a  sample.  These  primers  have  been 
used  to  show  that  phi  SlOl  is  present  in  the  waters 
around  Scripps  Pier  most  of  the  year  and  that  the 
phage  population  rapidly  increases  during  Lingu- 
lodinium  polyhedrum  blooms  (Breitbart,  Deyanat- 
Yazdi,  Rohwer,  unpublished  data).  These  specific 
primers  have  also  been  used  to  rapidly  differentiate 
between  phages  that  infect  Roseobacter  SI 067 
(Rohwer  et  al.,  2000). 

To  examine  Synechococcus  cyanophage,  Zhong 
et  al.  (2002)  redesigned  the  PCR  primers  of  Fuller 
et  al.  (1998)  to  specifically  amplify  a  larger  region 
(592  bp)  of  the  g20  homologue  of  marine  cyano¬ 
phage  for  use  in  phylogenetic  and  biogeographic 
studies.  Analysis  of  g20  sequences  from  cyano¬ 
phage  isolates  revealed  that:  (1)  the  isolates  were 
highly  diverse  yet  more  closely  related  to  each 
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other  than  to  enteric  coliphage  T4;  and  (2)  there 
was  no  correlation  between  genetic  variation  of 
the  clones  and  geographic  location.  Analysis  of 
g20  sequences  from  six  clone  libraries  of  natural 
virus  concentrates  revealed  that:  (1)  six  of  nine 
phylogenetic  clusters  represented  novel  uncultured 
g20  sequences;  and  (2)  the  phylogenetic  compo¬ 
sition  of  the  cloned  sequences  from  varying  envi¬ 
ronments  and  depths  were  different  from  each 
other.  All  of  these  results  indicate  a  high  genetic 
diversity  of  marine  cyanophage  assemblages. 

In  an  attempt  to  understand  the  diversity  and 
biogeography  of  viruses  infecting  eukaryotic  algae 
Chen  et  al.  (19%)  designed  PCR  primers  to 
selectively  amplify  part  of  the  DNA  polymerase 
genes  from  viruses  that  infect  two  eukaryotic  algae, 
an  endosymbiont  Chlorella-hke  alga  and  Micro- 
monas  pusilla.  These  primers  were  used  to  amplify 
sequences  from  environmental  samples  for  phylo¬ 
genetic  analyses  and  to  examine  biodiversity  using 
denaturing  gradient  gel  electrophoresis  (DGGE) 
(Short  and  Suttle,  1999,  2000,  2002;  Short  et  al., 
2000). 

4.4.  Prediction  of  the  ecological  niches  of  marine 
phages 

Phages  are  50%  DNA  by  weight,  which  means 
they  require  a  high  proportion  of  phosphate.  Since 
phosphate  is  often  limiting  in  the  marine  environ¬ 
ment  (Bjorkman  et  al.,  2000;  Cavender-Bares  et 
al.,  2001),  phosphate  concentrations  may  limit 
phage  production.  If  we  consider  that  the  average 
burst  size  in  the  marine  environment  is  ~50 
virions  (Borsheim,  1993;  Wommack  and  Colwell, 
2000)  and  that  the  average  phage  genome  is  ~  50 
kb  (Steward  et  al.,  2000),  then  a  typical  lytic  cycle 
requires  the  production  of  ~2.5  Mb  of  phage 
DNA.  This  is  roughly  equivalent  to  the  genome 
size  of  one  marine  bacterium  (calculated  to  be  2.3 
Mb  from  Simon  and  Azam,  1989).  Tantalizing 
hints  of  the  importance  of  phosphate  in  the  life  of 
marine  phages  from  the  available  genomes  include 
the  occurrence  of  genes  involved  in  recycling  or 
scavenging  more  phosphate  (e.g.  ribonucleotide 
reductases,  phoH,  thymidine  synthetases,  endo- 
and  exo- nucleases)  (Wikner  et  al.,  1993). 

5.  Technical  challenges  associated  with  sequenc¬ 
ing  marine  phage  genomes 

The  key  to  any  genomic  sequencing  project  is 
a  high  coverage  shotgun  library.  This  can  be  more 


problematic  for  marine  phage  than  for  ‘typical* 
prokaryotic  sequencing  projects.  The  foremost 
challenge  is  obtaining  a  sufficient  amount  of  DNA. 
Because  the  majority  of  marine  bacterial  hosts 
grow  slowly  and  to  lower  densities  than  non¬ 
marine  bacteria,  typical  yields  of  phage  DNA  are 
in  the  ng  range,  compared  to  the  micro-gram 
quantities  of  DNA  used  in  typical  shotgun  library 
protocols. 

A  second  problem  that  may  be  encountered 
when  sequencing  marine  phage  is  unclonable 
DNA.  Phages  often  modify  their  genomic  DNA  to 
avoid  host  restriction  systems  or  to  target  their 
DNA  for  activity  by  specialized  phage-encoded 
enzymes.  V.  parahaemolyticus  phi  TB16T  and  phi 
TB16C  could  not  be  cloned  using  standard 
approaches  (e.g.  enzymatic  digestion  and  cloning) 
(Rohwer  et  al.,  2001).  The  extent  of  this  phenom¬ 
enon  in  other  marine  phage  is  unknown. 

To  circumvent  problems  associated  with  both 
limiting  amounts  and  modified  DNA,  alternative 
shotgun  cloning  protocols  have  been  developed. 
Random  amplified  shotgun  libraries  (RASLs)  are 
constructed  by  first  amplifying  the  DNA  with 
random  10-mer  oligonucleotides  as  primers  (Roh¬ 
wer  et  al.,  2001).  The  resulting  products  are  then 
blunt  end  digested  and  cloned.  Using  the  RASL 
method,  shotgun  libraries  sufficient  to  sequence 
phage-sized  genomes  can  be  constructed  from 
~20  ng  of  initial  DNA.  Recently,  Breitbart  et  al. 
(in  press)  have  used  a  second  method  for  con¬ 
structing  high  coverage  shotgun  libraries,  called 
Linker  amplified  shotgun  libraries  (LASLs),  from 
uncultured  marine  viral  communities.  To  make  a 
LASL,  the  DNA  is  physically  broken  into  2-kb 
fragments  using  a  Hydroshear.  The  fragments  are 
then  end-repaired  and  asymmetrical  linkers  are 
ligated  to  the  fragment  ends.  Primers  to  the  linkers 
are  then  used  to  PCR-amplify  the  products  before 
they  are  cloned.  Using  the  LASL  protocol,  it  is 
possible  to  construct  libraries  containing  a  million 
clones  from  <  10  ng  of  initial  DNA.  Both  RASL 
and  LASL  protocols  have  been  shown  to  generate 
essentially  random  coverage  without  evidence  of 
chimeric  molecules. 

Closing  viral  genomes  after  the  initial  shotgun 
sequencing  phase  is  also  complicated  by  limited 
amounts  of  DNA.  One  way  to  use  less  DNA  than 
direct  sequencing  is  to  make  primers  to  the  ends 
of  all  the  available  contigs  and  perform  PCR  with 
the  mixture.  This  approach  was  used  when  closing 
phi  TB16C,  phi  TB16T,  phi  VpV  265  and  phi 
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SI01,  as  well  as  with  bacterial  genomes  (Rohwer 
et  al.,  2000;  Hardies  et  al.,  2002;  Tettelin  et  al., 
1999).  Limiting  DNA  also  makes  it  almost  impos¬ 
sible  to  directly  sequence  the  ends  of  linear  phage 
genomes.  Cloning  large  pieces  of  phage  genomes 
to  help  with  closure  will  probably  not  be  successful 
because  phage  genes  like  holins  and  lysozymes 
are  usually  lethal  to  E.  coli. 

6.  The  future 

The  field  of  marine  phage  genomics  is  in  its 
infancy.  Many  more  marine  phage  genomes  are  ‘in 
the  pipeline’  for  sequencing.  Due  to  their  small 
size,  100  phage  genomes  can  be  sequenced  for  the 
same  cost  as  one  large  bacterial  genome.  Thus,  the 
study  of  phage  genomes  is  particularly  economical. 
Moreover,  phage  genomes  are  easier  to  under¬ 
stand— their  small  size  makes  it  practical  to  model 
phage  lifecycles  in  the  mind  or  on  a  computer. 
Combined  with  our  increasingly  detailed  knowl¬ 
edge  of  the  marine  microbial  food  web,  marine 
phage  should  be  leading  the  effort  to  understand 
how  a  community  of  organisms  and  the  environ¬ 
ment  interact  with  each  other  at  the  genomic  level. 
Just  as  a  reductionist’s  view  of  phage  biology  led 
to  significant  advances  in  the  field  of  molecular 
biology,  it  is  reasonable  to  expect  that  a  reduction¬ 
ist’s  view  will  prove  invaluable  to  our  understand¬ 
ing  of  complex  natural  microbial  systems.  The 
ability  to  produce  a  large  number  of  genomes  and 
extract  a  wealth  of  useful  and  predictive  informa¬ 
tion  from  them  is  a  compelling  reason  for  the 
phage  field  to  be  leading  the  way  into  massive 
comparative  genomics. 
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Plating  Prochlorococcus  and  Synechococcus  strains  in  top  agarose  for  plaque  assays 

(GIBCO  LM  agarose  =  now  called  Invitrogen  Life  Technologies  LMP  agarose, 
1-800-955-6288,  cat#  1-5517-014  for  50  g,  1-5517-022  for  100  g) 

(1)  Prepare  base  plates  using  0.4%  GIBCO  LM  agarose  (0.4g  agarose  /  100  ml) 

a.  Make  up  plates  using  filtered,  autoclaved  75%  Sargasso  Sea  Water  (SSW) 

b.  Add  appropriate  agarose,  then  microwave  to  a  boil 

c.  Allow  agarose  to  cool  in  a  pre-heated  water  bath  to  -35°C  (solidifies  at  24-28°C)  and  add 
appropriate  nutrients  to  bring  levels  up  to  media  of  choice  (e.g.,  SN,  Pro99) 

d.  Vortex  /  shake  to  mix  nutrients  thoroughly  and  asceptically  pour  plates  in  hood 

e.  Allow  plates  to  sit  overnight  to  solidify  before  adding  top  agarose 

i.  Anecdotal  observations  suggest  that  Synechococcus  are  less  sensitive  to  plating 
conditions  than  Prochlororoccus :  therefore  Synechococcus  base  plates  can  be 
used  1-3  days  after  being  made,  whereas  Prochlorococcus  base  plates  should  be 
used  the  next  day 

ii.  Anecdotal  observations  also  suggest  that  some  strains  (e.g.,  MED4,  NATL2A, 
WH  8102,  WH  7803)  can  go  without  base  plates  but  may  not  stay  pigmented  as 
long  as  those  plates  with  base  plates,  whereas  other  strains  do  better  without 
base  plates  (e.g.,  SS120  and  MIT  9312) 

(2)  Prepare  dilutions  in  5  ml  Falcon  tubes  ahead  of  plating  cells  in  top  agarose. 

a.  Prepare  phage  dilution  to  a  total  volume  300  p.1  using  appropriate  dilution  of  phage  stock 
lysate  into  appropriate  growth  media 

(3)  Prepare  top  agarose  as  0.5%  GIBCO  Low-Melt  agarose  (1  part  cells  :  4  parts  agarose  yields  -0.4% 
agarose  final  concentration) 

a.  Use  filtered,  autoclaved  75%  SSW  to  make  up  top  agarose 

b.  Add  appropriate  agarose  (0.5  g  /  100  ml  SSW) ,  microwave  tindalize  as  above 

c.  Allow  to  cool  in  a  pre-heated  water  bath  to  ~29°C  (solidifies  -24-28°C)  and  asceptically 
add  filter  sterilized  nutrients  to  bring  level  of  top  agarose  nutrients  to  desired  media  of 
choice 

d.  Add  appropriate  volume  of  exponentially  growing  cells  to  top  agarose  (i.e.,  10  ml  of  a 
dense  stock- 108  cells  ml"1  added  to  40  ml  of  top  agarose),  vortex  to  thoroughly  mix  and 
plate  immediately 

e.  Add  2.7  ml  of  the  cell-agarose  mixture  to  the  0.3  ml  phage  dilution  prepared  in  advance, 
then  quickly  and  gently  vortex  and  pour  plate  immediately 

f.  Incubate  all  plates  overnight  at  -2-5  uE  light  without  parafilming  the  plates  to  allow  the 
top  agarose  to  solidify  completely 

g.  After  the  o/n  incubation  parafilm  the  plates  (humidity  control)  and  place  at  the 
appropriate  light  levels. 

Note:  due  to  the  harshness  of  the  plating  procedures  (e.g.,  shifts  in  temperature,  variability  in 
media  introduced  from  addition  of  agarose),  anecdotal  observations  suggest  that  the  light 
levels  for  plating  might  need  be  slightly  reduced  (-  to  33%  less)  to  those  used  for  liquid 
cultures. 

(4)  Inoculate  a  liquid  culture  using  the  same  dilution  chosen  for  the  plates  as  an  indicator  of  when  the 
plates  might  turn  green  (within  a  few  days  of  each  other) 


192 


Prochlorococcus  Phage  Isolation  and  Purification 


(1)  Picking  plaques  for  liquid  lysates 

a.  Prepare  base  plates  using  0.4%  GIBCO  Low -Melt  agarose  (0.4g  agarose  /  100  ml 
SSW)  as  described  in  plating  protocol 

b.  Prepare  0.5%  GIBCO  Low-Melt  agarose  (0.5g  agarose  /  100ml)  as  described  in 
plating  protocol 

c.  Incubate  until  a  lawn  of  Prochlorococcus  has  grown  up  and  clearings  (plaques) 
form  in  the  lawn 

i.  To  see  plaques,  it  may  be  necessary  to  hold  the  plates  up  to  the  light 
outside  the  incubator  as  the  Prochlorococcus  lawns  are  light  colored 

d.  Pick  plaques 

i.  Poke  the  plaque  in  the  agarose  with  the  tip  of  an  autoclaved  pasteur 
pipette  to  core  out  the  plaque  drawing  agarose  chunks  into  the  pipette. 

ii.  Inoculate  2  ml  cells  with  the  phage  plaque  by  pipetting  up  and  down  a 
few  times,  gently  shake  to  dissociate  phages  from  agarose  matrix. 

iii.  Let  the  plaque  core  sit  in  the  incubator  for  ~60  minutes  to  allow  phage  to 
adsorb  to  host  cells,  then  dilute  with  about  15  ml  media. 

iv.  Do  this  in  parallel  to  a  control  tube  (without  a  plaque  core)  and  monitor 
the  cultures  for  phage  lysis  relative  to  the  controls. 

(2)  Harvesting  the  fresh  phage  lysate 

a.  To  harvest  the  phage  (remove  cell  debris  that  will  reduce  titer  during  storage), 
decant  the  entire  tube  (-24  ml)  into  a  50  ml  orange  capped  tube  and  spin  on  the 
Beckman  JA-17  rotor,  10K  (~13,800g),  17°C,  20  minutes 

b.  Pre-label  an  acid-washed  borosilicate  glass  tube  with  the  phage  name  (e.g., 
MED4-57)  with  the  phage  name,  the  date  you  harvested,  and  “spun”. 

c.  Pipette  the  supernatant  into  the  pre-labeled  tubes  and  store  4°C,  dark. 
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Large-scale  phage  purification  using  CsCl  gradients _ 

(adapted  from  Rowher  et  al.  2000,  used  in  genome  preparations  for  P-SSP7,  P-SSM2,  P-SSM4,  P-SS2) 

1.  Grow  up  cells  to  yield -1011  cells 

a  Inoculate  1  L  Pro99  with  exponentially  growing  cells  (-20-50  ml,  near  108  cells 
ml 1  density)  and  harvest  cells  when  cell  densities  in  1L  culture  -108  cells  ml'1  - 
NOTE:  LV  cultures  require  the  addition  of  NaHCOj  (10  mM  final  concentration) 

b.  Pellet  (4500  rpm  on  Beckman  -  3100g,  22°C,  20  minutes)  using  large  250  ml 
centrifuge  tubes  and  resuspend  pelleted  cells  using  minimal  (5  ml)  Pro99 

c.  Pool  into  IL  polycarbonate  bottle 

d.  Add  5  ml  of  stock  cyanophage  (>106  phage  ml'1)  for  30  minutes  at  room 
temperature 

e.  Add  1L  of  Pro99  medium  and  incubate  until  see  lysis  relative  to  control  cells, 
then  proceed  to  phage  purification 

2.  Phage  purification  (adapted  from  Sambrook  et  al.  1989) 

a.  10  ml  chloroform  L'1  phage  lysate  for  15  mins  at  room  temperature 

(strips  bilayer  off  unburst  cells  to  maximize  phage  yield;  note  that 
chloroform  melts  plastic,  so  work  in  glass) 

b.  Incubate  with  ribonuclease  A  (RnaseA;  lOpg  ml'^add  1ml  stock)  and 
deoxyribonuclease  1  (Dnasel;  0.25  SU  mr'=add  25|xl  of  stock)  for  lh  at  RT 

(chews  up  undigested  HMW  DNA  to  avoid  losing  phage  in  spin) 

c.  Add  NaCl  to  final  concentration  ~1M,  incubate  on  ice  30  mins. 

(Peter  Weigele  suggests  1.75M  NaCl  addition  and  adding  Triton-X  as  well) 
(promotes  phage  dissociation  from  particles) 

d.  Centrifuge  10.000  rpm,  30  mins.,  4°C  -  removes  cell  debris 

i.  Filter  the  supernatant  through  a  Kim  Wipe  tissue  paper  (removes 
chloroform-lysed  lipid  membranes  that  float  into  supernatant) 

e.  Incubate  spnt  in  polyethylene  glycol  (PEG  8,000;  100  g  L'1)  overnight  at  4°C 

(precipitates  phage)  —  stored  here  if  needed  — 

f.  Collect  PEG-phage  precipitate  by  centrifugation  at  9,000  rpm  using  a  Sorvall 
GSA  rotor  for  30  mins.,  discard  supernatant 

g.  Resuspend  pellet  into  5  ml  Pro99  (or  MSM  =  32.5  mM  NaCl,  12  mM  MgS04,  50 
mM  Tris,  0.1%  gelatin),  extract  PEG  using  chloroform  (Peter  Weigele’s  new  data 
suggests  not  to  extract  the  PEG  before  CsCl) 

h.  Prepare  a  CsCl  step  gradient  (CsCl  p  =1.3,  1.4,  1.5,  1.65)  and  add  sample  at  top  - 
spin  on  ultracentrifuge  (2  h,  4°C,  104,000  x  g ) 

i.  NOTE:  Using  P22  phage  as  a  test,  we  know  that  <8xl09  phage  ml'1 
yields  no  apparent  band  on  a  CsCl  step  gradient,  but  1012  phage  ml'1 
yields  a  nice  1  mm  thick  band  between  the  p=1.4  and  1.5  CsCl  bands 

i.  Remove  purified  phage  band  (interface  of  p=1.4  and  1.5  bands  on  step  gradient) 

j.  Dialyze  in  Slide-A-Lyzer  10K  MWCO  dialysis  cassettes  (Pierce  #66425)  0.5  - 
3.0  ml  sample  volume  -  against  buffer  of  choice 

k.  Extract  phage  DNA  using  DNA  extraction  protocol  (proteinase  K /  SDS,  phenol- 
chloroform  extract,  ethanol  ppt) 

l.  Resuspend  the  pellet  into  buffer  of  choice  (Tris,  Qiagen  EB,  water) 

3.  Often  yields  <1  pg  of  DNA 
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Phage  DNA  Extraction  Protocol 


1.  Take  1  volume  of  dialyzed,  CsCl  purified  phage  lysate  and  add  50  ug/ml  Proteinase  K  and  0.5%  SDS. 
a.  400  [x\  lysate  =  add  1  jlxI  of  ProtK  20  mg/ml  stock  (20^g)  and  20  pi  10%  SDS  stock  (0.5%  final 

concentration). 

2.  Mix  and  incubate  1  hour  at  56°C. 

3.  Allow  to  cool  to  RT. 

4.  Add  an  equal  volume  of  phenol  and  invert  several  times. 

5.  Spin  3000g  (6000  rpm  on  microfiige),  5  minutes,  RT. 

6.  Carefully  transfer  the  supernatant  with  a  wide-bore  pipette  to  a  fresh  tube. 

7.  Add  an  equal  volume  of  phenol ichoroform  (1:1),  invert. 

8.  Spin  as  above  and  transfer  the  supernatant  as  above. 

9.  Add  an  equal  volume  of  chloroform,  invert,  spin,  transfer  supernatant  as  above. 

DNA  Precipitation  protocol 

1.  Add  2  volumes  of  ice  cold  ethanol,  mix  well  and  sit  on  ice  15-30  minutes.  Let  stand  longer  if 
expecting  significantly  low  DNA  yields  (<pg).  Can  also  leave  overnight  at  -20°C. 

2.  Microfuge  at  max  speed  30  minutes,  4°C. 

3.  Carefully  remove  supernatant  and  fill  tube  halfway  with  70%  ethanol  (made  from  100%  with 
autoclaved  milli-Q-water),  spin  at  max  for  5  minutes,  4°C. 

4.  Repeat  the  above  step  one  time  (2nd  70%  wash). 

5.  Remove  as  much  ethanol  as  possible  without  disturbing  the  pellet  -  good  idea  to  hold  onto  the 
supernatant  until  confirmed  DNA. 

6.  Leave  tube  open  on  bench  -  15  minutes  to  let  ethanol  disperse. 

7.  Dissolve  in  TE  buffer  (~  pH  7.6). 

Quantifying  DNA  with  agarose  gel 

1.  Prepare  1%  slab  gel,  then  0.5%  agarose  for  the  high-molecular- weight  DNA  to  run  through 

2.  lul  of  lOxloading  buffer  per  lOul  of  sample.  Prepare  the  appropriate  dilutions  of  the  appropriate 
marker  (HINDIII  X  DNA).  NOTE,  the  largest  size  resolved  on  electrophoretic  gels  without  PFGE  or 
CHEF  gels  will  be  about  23  kb  band  of  this  marker  and  fragmented  DNA  runs  as  a  smear  smaller  than 
23  kb. 

3.  Run  at  5-10V  per  cm  of  the  gel  bringing  the  current  up  and  down  gently  to  avoid  ‘frowning’  bands. 
Watch  for  bromophenol  blue  marker  to  run  about  3-4  cm  (~1  hour)  and  capture  the  image  using  the 
Eagle  Eye 
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Prepping  phage  for  viewing  on  the  TEM 


(1)  Concentrate  phage  to  approximately  109  -  1010  PFU  ml'1 

a.  Grow  up  a  lysate  and  harvest  when  cells  lyse  relative  to  controls  by  0.45  pm  filtering  the 
lysate  to  remove  bulk  cell  parts 

b.  Centrifugation  to  remove  more  cell  parts 

i.  Sorvall  GSA  rotor,  8500  rpm  =  1 1000  g,  18°C,  20  minutes 

ii.  decant  the  supernatant  (contains  the  phage)  into  a  new  tube 

c.  Ultracentrifugation  to  pellet  the  phage 

i.  SW55  swinging  bucket  rotor  on  Beckman  ultracentrifuge  holds  up  to  6  tubes  of 
5  ml  of  lysate  per  rotor 

ii.  Prep  the  samples  by  precisely  weighing  out  the  tubes  in  the  sample  holders 

iii.  40,000  rpm,  60  minutes,  18°C 

iv.  Decant  off  the  supernatant  and  re-suspend  the  pellet  in  -50  pi  buffer  of  choice 
(overnight,  4°C,  shaker  at  -100  RPM) 

(2)  Staining  the  phage  particles 

a.  Place  5  pi  of  phage  concentrate  onto  the  shiny  side  (carbon  substrate  side)  of  a  Carbon 
type  B  400  mesh  copper  EM  grid  (obtained  from  Ted  Pella) 

b.  Let  sample  sit  for  about  3  minutes 

c.  Rinse  sample  off  with  three  drops  of  2%  uranyl  acetate  (0.2  pm  filtered)  and  let  the 
fourth  drop  sit  on  the  EM  grid  for  approximately  45  seconds 

d.  Gently  wick  away  the  2%  UA  to  leave  a  little  bit  of  it  on  the  grid  to  air  dry 

(3)  Viewing  on  the  scope 

a.  Follow  directions  for  the  individual  EM  you  are  using 

b.  Start  with  low  magnification  (-40-100  x)  to  see  the  whole  grid  and  go  to  higher  power 
(40K-75K  x)  on  the  grid  windows  with  more  negative  staining 
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Adsorption  Kinetics  Assay _ 

(Modified  from  Adams  1959  and  Stent  1963) 


(1)  Titer  a  phage  stock  in  advance  (to  be  certain  >106  phage  ml*1) 

(2)  Mix  dense  exponentially  growing  cells  (-5-8  x  107  cells  ml'1)  and  phage  to  yield  a  multiplicity  of 
infection  (MOI)  -  1  phage  per  10-100  host  cells 

a.  Cseke  and  Farkas  (1979)  suggest  that  MOIs  of  0.05,  5,  and  20  have  similar  adsorption 
kinetics,  but  the  effect  of  MOI  should  be  tested  in  each  phage-host  system 

(3)  Remove  a  sample  every  10  minutes  for  60  minutes  and  dilute  it  100-fold  in  4°C  medium  to  stop 
adsorption,  spin  out  cells  with  a  centrifugation  spin  at  1 1,000  g ,  5  minutes 

(4)  Titer  the  free  phage  in  the  supernatant  using  triplicate  plaque  assays 

(5)  If  you  must  store  any  samples  before  plating  then  be  sure  to  keep  them  in  the  dark  at  4°C 

(6)  Fit  a  regression  line  to  the  average  of  the  titers  and  calculate  an  adsorption  rate  constant  (k)  as 
per  Stent  (1963)  using  the  following  formula: 

-dP/dl  =  kNP, 

where  P  is  the  free  phage  particles  remaining  unadsorbed  after  a  time  r,  N  the  concentration  of 
bacterial  cells  in  the  adsorption  mixture  and  k  the  adsorption  rate  constant. 

(7)  The  von  Schmoluchowski  equation  can  also  be  used  to  predict  the  value  of  the  adsorption  rate 
constant: 


k  =  4IFRCf, 

where  R  is  the  radius  of  a  sphere  whose  surface  area  is  equal  to  that  of  the  bacterial  cell  and  C  is 
the  diffusion  constant  of  the  virus  (can  be  estimated  from  physicochemical  measurements  of  the 
virus)  and /is  the  fraction  of  viral  collisions  which  result  in  adsorption  (appears  to  be  close  to  1  for 
optimal  adsorption  conditions  for  T4). 


One-Step  Growth  Curve  Assay 

(modified  from  Adams  1959  and  Stent  1963) 


(1)  Titer  a  phage  stock  in  advance  (want  ~107  to  108  phage  ml'1) 

(2)  Mix  dense  exponentially  growing  cells  (~5-8  x  107  cells  ml'1)  with  phage  to  yield  an  MOI  -1 
phage  per  100  host  cells 

a.  NOTE:  Stent  (1963)  suggests  that  the  latent  period  and  burst  size  are  not  greatly  affected 
by  the  number  of  phage  particles  used  to  infect  the  host  cell  for  MOIs  <1  to  >1.  [Citing: 
Delbruck  and  Luria  (1942)  Interference  between  two  bacterial  viruses  acting  upon  the 
same  host,  and  mechanism  of  virus  growth.  Arch.  Biochem.  1:  111]  However,  this  should 
be  tested  for  each  phage-host  system 

(3)  Let  adsorb  for  1  hour,  then  dilute  into  two  tubes  (100-fold  and  10,000-fold) 

a.  The  100-fold  dilution  will  become  the  first  half  of  the  growth  curve  data  for  samples 
believed  to  have  been  taken  before  the  phages  begin  to  lyse  cells 

b.  The  10,000- fold  dilution  will  become  the  second  half  of  the  growth  curve  data  for 
samples  believed  to  have  been  taken  after  the  phages  have  begun  to  lyse  the  cells 

(4)  Take  2  samples  (1  ml)  at  each  time  point  as  often  as  possible  (1-3  hour  intervals)  for  about  24  -  30 
hours  by  removing  a  sample 

a.  Place  one  tube  on  ice 

b.  To  the  second  tube,  add  1%  chloroform  for  10  minutes,  mix  to  disrupt  intact  cells  and 
liberate  intracellular  phage,  decant  the  supernatant  into  a  fresh  tube 

(5)  Centrifuge  the  1  ml  samples  (1 1,000  g ,  5  minutes)  to  remove  cell  debris 

a.  Plaque  assay  each  sample  being  careful  to  choose  the  dilutions  plated  appropriately 
depending  upon  where  in  the  “growth  cycle”  the  samples  was  taken  from 

b.  The  PFUs  from  the  chloroform  treated  cells  will  show  you  when  the  eclipse  period  has 
ended  and  phage  production  has  started  to  occur,  whereas  the  PFUs  from  the  untreated 
cells  will  provide  information  about  the  length  of  the  latent  period  of  the  phage 
population 

(6)  Plot  the  average  and  standard  deviation  of  the  points  and  fit  a  regression  curve  to  the  data 

(7)  Determine  the  length  of  the  latent  period  from  time  zero  until  the  rise  period  of  the  untreated  cell 
PFUs  begins 

(8)  Determine  the  length  of  the  eclipse  period  from  time  zero  until  the  rise  period  of  the  chloroform 
treated  cell  PFUs  begins 

(9)  Calculate  the  burst  size  as  the  difference  in  PFU  ml'1  between  the  latent  period  free  phage 
concentration  and  the  final  phage  concentration  after  the  rise  period 


198 


50272-101 


REPORT  DOCUMENTATION 
PAGE 

1.  REPORT  NO. 

MIT/WHOI  2004-09 

2. 

3.  Recipient's  Accession  No. 

4.  Title  and  Subtitle 

Ecoloev.  Diversity  and  Comparative  Genomics  of  Oceanic  Cvanobacterial  Viruses 

5.  Report  Date 

June  2004 

6. 

7.  Author(s)  _ 

Matthew  B.  Sullivan 

8.  Performing  Organization  Rept.  No. 

9.  Performing  Organization  Name  and  Address 

10.  Project/Task/Work  Unit  No. 

MIT/WHOI  2004-09 

MIT/WHOI  Joint  Program  in  Oceanography/ Applied  Ocean  Science  &  Engineering 

11.  Contract(C)  or  Grant(G)  No. 

OCE-9820035 
{  1  DE-FG02-02ER63445 
(G)  NAG5-7538 

7  -T32-HG0039-05 

12.  Sponsoring  Organization  Name  and  Address 

National  Science  Foundation 

Department  of  Energy 

13.  Type  of  Report  &  Period  Covered 

Ph.D.  Thesis 

National  Aeronautics  and  Space  Administration 

National  Institutes  of  Health 

14. 

15.  Supplementary  Notes 

This  thesis  should  be  cited  as:  Matthew  B.  Sullivan,  2004.  Ecology,  Diversity  and  Comparative  Genomics  of  Oceanic 
Cyanobacterial  Viruses.  Ph.D.  Thesis.  MIT/WHOI,  2004-09. 

16.  Abstract  (Limit:  200  words) 

This  thesis  describes  the  isolation  and  subsequent  characterization  of  viruses  (cyanophages)  that  infect  the  numerically  dominant 
primary  producers  in  the  oceans,  Prochlorococcus  and  Synechococcus.  These  cyanophage  isolates  belong  to  one  of  three 
morphological  families,  Myoviridae,  Podoviridae  and  Siphoviridae,  with  host  strains  of  similar  ecotypes  often  yielding 
cyanophages  of  the  same  family.  Host-range  analyses  demonstrated  varying  levels  of  specificity,  ranging  from  infection  of  a 
single  strain  to  infection  across  ecotypes  and  even  across  both  cyanobacterial  genera.  Strain-specific  cyanophage  titers  were  low 
in  open  ocean  waters  where  total  cyanobacterial  abundances  were  high,  suggesting  low  phage  titers  might  be  a  feature  of  open 
oceans.  The  diversity  of  Myoviridae  isolates,  examined  using  the  portal  protein  gene,  suggested  that  cultured  isolates  were  not 
representative  of  naturally  occurring  portal  protein  gene  diversity.  Finally,  three  Prochlorococcus  cyanophage  genome  sequences 
revealed  these  genomes  were  similar  to  well-studied  T7-  and  T4-like  phages,  but  additionally  suggested  modification  for 
infection  of  photosynthetic  hosts,  that  live  in  nutrient-limited  environments.  Many  non-phage  genes  were  found  to  be 
full-length  and  conserved  in  sequence  suggesting  they  are  functional  during  infection.  Phylogenetic  inference  suggests  that  some 
of  these  genes  were  horizontally  transferred  between  host  and  phage  influencing  the  evolution  and  ecology  of  both  host  and 
phage. 

17.  Document  Analysis  a.  Descriptors 

Prochlorococcus 

virus 

cyanobacteria 

b.  Identifiers/Open-Ended  Terms 

c.  COS  ATI  Field/Group 

18.  Availability  Statement 

Annrnved  fnr  nnhliratinn*  distribution  unlimited 

19.  Security  Class  (This  Report) 

UNCLASSIFIED 

21.  No.  of  Pages 

198 

20.  Security  Class  (This  Page) 

22.  Price 

(See  ANSI-Z39.1 8)  See  Instructions  on  Reverse  OPTIONAL  FORM  272  (4-77) 


(Formerly  NTIS-35) 
Department  of  Commerce 


